Understanding BigQuery WHERE Statement (Part 2)

July 24, 2024
Chapter 4: Understanding BigQuery WHERE Statement (Part 2)

Today, we're diving deeper into the WHERE statement in BigQuery. In this second part of our BigQuery free course series on how to use the WHERE statement, we'll explore advanced filtering techniques including the LIKE statement, handling NULL values, using logical operators, and leveraging the REGEXP_CONTAINS function for more precise queries. Whether you're new to BigQuery or looking to sharpen your skills, these tips will help you refine your data queries for better insights. If you still haven't checked the first part, make sure to read "Understanding the WHERE Statement: Part 1".

Types of Filters for the WHERE Statement

1. The LIKE Statement

Let's start with the LIKE statement. The LIKE statement is a powerful tool for pattern matching in SQL queries. Advantage of using LIKE statement is that, it allows you to filter rows based on partial matches using the percentage sign (%) as a wildcard.

SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE name LIKE '%Fillmore St%'
ORDER BY name

Query result using LIKE with WHERE Statement

This query selects all columns from the bike_share_station_info table. It filters rows where the name contains "Filmore St" anywhere in the string. The result is ordered by the name column.

2. The NOT LIKE Statement

SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE name NOT LIKE '%Fillmore St%'
ORDER BY name

Query result using NOT LIKE Statement

This query selects all columns from the bike_share_station_info table, where the name does not contains "Filmore St" anywhere in the string.

3. Handling NULL and NOT NULL Values

In SQL, NULL values represent missing or undefined data. It's crucial to handle NULLs properly in your queries.

SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE region_id IS NULL
ORDER BY name

This query selects all columns from the 'bike_share_station_info' table and filters rows where the 'region_id' is NULL

Note: To check for NULL values, use 'IS NULL' instead of '='

Example Query for NOT NULL is:

SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE region_id IS NOT NULL
ORDER BY name

This query selets all columns from the 'bike_share_station_info' table, and filters rows where the 'region_id' is not NULL

4. Using Logical Operators: AND and OR

Logical operators allow you to combine multiple conditions in your WHERE clause. The AND operator requires all conditions to be true, while the OR operator requires at least one condition to be true.

Example Query with AND:

SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE region_id IS NULL
  AND name LIKE '%st%'
ORDER BY name

Query result using multiple filter condition logical operator 'AND' and LIKE statement

This query selects all columns from the 'bike_share_station_info' table, and filters rows where the region_ id is not NULL and the name of the station contains ends "St"

Example Query with OR:

SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE region_id IS NULL
  OR name LIKE '%st%'
ORDER BY name

This query selects all columns from the 'bike_share_station_info' table, and filers rows either the region_id is NULL or the station name contains"St".

Query results using OR logical operator and LIKE statement

5. REGEXP_CONTAINS Function

The REGEXP_CONTAINS function in BigQuery allows you to use regular expressions for pattern matching. This can be more powerful and flexible than the LIKE statement.

Example Query with LIKE Statement:

SELECT 
    order_number, 
    order_created_at, 
    customer_id,
    product_sku,
    product_last_name,
    order_rank
 FROM `dwh-udemy-course.3_dwh_ecommerce.order_item`
 WHERE TRUE
    AND product_last_name LIKE '%Hoodie%'
ORDER BY order_created_at

Example Query with REGEX:

SELECT 
    order_number, 
    order_created_at, 
    customer_id,
    product_sku,
    product_last_name,
    order_rank
 FROM `dwh-udemy-course.3_dwh_ecommerce.order_item`
 WHERE REGEXP_CONTAINS(product_last_name,'Hoodie')
ORDER BY order_created_at

Query result using REGEX_CONTAINS in WHERE Statement

This query selects all columns from the 'order_item' table, and filters rows where the 'product_last_name' contains the string "hoodie" using a regular expression. Unlike the LIKE statement, 'REGEX_CONTAINS' does not require percentage sign, making it more straightforward for complex patterns.

For example, if you add percentage sign with REGEX_CONTAINS, the percentage will be considered as one of the string, not as a special character

Example Query

SELECT 
    order_number, 
    order_created_at, 
    customer_id,
    product_sku,
    product_last_name,
    order_rank
 FROM `dwh-udemy-course.3_dwh_ecommerce.order_item`
 WHERE REGEXP_CONTAINS(product_last_name,'%Hoodie')
ORDER BY order_created_at

Query result using percentage sign in REGEX_CONTAINS

Incase, you want to query result rows that does not contains the word 'Hoodie' using REGEX_CONTAINS

SELECT 
    order_number, 
    order_created_at, 
    customer_id,
    product_sku,
    product_last_name,
    order_rank
 FROM `dwh-udemy-course.3_dwh_ecommerce.order_item`
 WHERE REGEXP_CONTAINS(product_last_name,'%Hoodie') = FALSE
ORDER BY order_created_at

6. Combining Multiple Conditions

You can combine multiple conditions to refine your queries further.

Note: Avoid using AND and OR together. Incase it's absolutely needed Using parentheses helps control the order of evaluation when mixing AND and OR operators.

Example Query:

SELECT 
    order_number, 
    order_created_at, 
    customer_id,
    product_sku,
    product_last_name,
    order_rank
 FROM `dwh-udemy-course.3_dwh_ecommerce.order_item`
 WHERE REGEXP_CONTAINS(product_last_name,'%Hoodie') = FALSE
    AND order_rank = 1
    AND (product_sku LIKE 'FA%' OR product_sku LIKE 'FB%')
ORDER BY order_created_at

This query selects all columns from the order_item table. It filters rows where the product_last_name does not contain "hoodie", the order_rank is 1, and the product_sku starts with "FA" or "FB".

The use of parentheses ensures that the OR condition is evaluated correctly in conjunction with the AND conditions.

Conclusion

Understanding and using the WHERE statement effectively is crucial for querying data in BigQuery. The LIKE statement, handling NULL values, logical operators, and the REGEXP_CONTAINS function provide powerful ways to filter and refine your data. By combining these techniques, you can write more precise and efficient queries to gain better insights from your datasets. Stay tuned for more tutorials and tips on our DAT YouTube channel, and don't forget to check out our blog on Medium for in-depth articles on BigQuery and other data topics. Happy querying!

Subscribe to our newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.