Today, we're diving deeper into the WHERE statement in BigQuery. In this second part of our BigQuery free course series on how to use the WHERE statement, we'll explore advanced filtering techniques including the LIKE statement, handling NULL values, using logical operators, and leveraging the REGEXP_CONTAINS function for more precise queries. Whether you're new to BigQuery or looking to sharpen your skills, these tips will help you refine your data queries for better insights. If you still haven't checked the first part, make sure to read "Understanding the WHERE Statement: Part 1".
Let's start with the LIKE statement. The LIKE statement is a powerful tool for pattern matching in SQL queries. Advantage of using LIKE statement is that, it allows you to filter rows based on partial matches using the percentage sign (%) as a wildcard.
SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE name LIKE '%Fillmore St%'
ORDER BY name
This query selects all columns from the bike_share_station_info
table. It filters rows where the name
contains "Filmore St" anywhere in the string. The result is ordered by the name
column.
SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE name NOT LIKE '%Fillmore St%'
ORDER BY name
This query selects all columns from the bike_share_station_info
table, where the name
does not contains "Filmore St" anywhere in the string.
In SQL, NULL values represent missing or undefined data. It's crucial to handle NULLs properly in your queries.
SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE region_id IS NULL
ORDER BY name
This query selects all columns from the 'bike_share_station_info' table and filters rows where the 'region_id' is NULL
Note: To check for NULL values, use 'IS NULL' instead of '='
Example Query for NOT NULL is:
SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE region_id IS NOT NULL
ORDER BY name
This query selets all columns from the 'bike_share_station_info' table, and filters rows where the 'region_id' is not NULL
Logical operators allow you to combine multiple conditions in your WHERE clause. The AND
operator requires all conditions to be true, while the OR
operator requires at least one condition to be true.
Example Query with AND:
SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE region_id IS NULL
AND name LIKE '%st%'
ORDER BY name
This query selects all columns from the 'bike_share_station_info' table, and filters rows where the region_ id is not NULL and the name of the station contains ends "St"
Example Query with OR:
SELECT *
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_station_info`
WHERE region_id IS NULL
OR name LIKE '%st%'
ORDER BY name
This query selects all columns from the 'bike_share_station_info'
table, and filers rows either the region_id is NULL or the station name contains"St".
The REGEXP_CONTAINS
function in BigQuery allows you to use regular expressions for pattern matching. This can be more powerful and flexible than the LIKE statement.
Example Query with LIKE Statement:
SELECT
order_number,
order_created_at,
customer_id,
product_sku,
product_last_name,
order_rank
FROM `dwh-udemy-course.3_dwh_ecommerce.order_item`
WHERE TRUE
AND product_last_name LIKE '%Hoodie%'
ORDER BY order_created_at
Example Query with REGEX:
SELECT
order_number,
order_created_at,
customer_id,
product_sku,
product_last_name,
order_rank
FROM `dwh-udemy-course.3_dwh_ecommerce.order_item`
WHERE REGEXP_CONTAINS(product_last_name,'Hoodie')
ORDER BY order_created_at
This query selects all columns from the 'order_item' table, and filters rows where the 'product_last_name' contains the string "hoodie" using a regular expression. Unlike the LIKE statement, 'REGEX_CONTAINS' does not require percentage sign, making it more straightforward for complex patterns.
For example, if you add percentage sign with REGEX_CONTAINS, the percentage will be considered as one of the string, not as a special character
Example Query
SELECT
order_number,
order_created_at,
customer_id,
product_sku,
product_last_name,
order_rank
FROM `dwh-udemy-course.3_dwh_ecommerce.order_item`
WHERE REGEXP_CONTAINS(product_last_name,'%Hoodie')
ORDER BY order_created_at
Incase, you want to query result rows that does not contains the word 'Hoodie' using REGEX_CONTAINS
SELECT
order_number,
order_created_at,
customer_id,
product_sku,
product_last_name,
order_rank
FROM `dwh-udemy-course.3_dwh_ecommerce.order_item`
WHERE REGEXP_CONTAINS(product_last_name,'%Hoodie') = FALSE
ORDER BY order_created_at
You can combine multiple conditions to refine your queries further.
Note: Avoid using AND and OR together. Incase it's absolutely needed Using parentheses helps control the order of evaluation when mixing AND and OR operators.
Example Query:
SELECT
order_number,
order_created_at,
customer_id,
product_sku,
product_last_name,
order_rank
FROM `dwh-udemy-course.3_dwh_ecommerce.order_item`
WHERE REGEXP_CONTAINS(product_last_name,'%Hoodie') = FALSE
AND order_rank = 1
AND (product_sku LIKE 'FA%' OR product_sku LIKE 'FB%')
ORDER BY order_created_at
This query selects all columns from the order_item table. It filters rows where the product_last_name does not contain "hoodie", the order_rank is 1, and the product_sku starts with "FA" or "FB".
The use of parentheses ensures that the OR condition is evaluated correctly in conjunction with the AND conditions.
Understanding and using the WHERE statement effectively is crucial for querying data in BigQuery. The LIKE statement, handling NULL values, logical operators, and the REGEXP_CONTAINS function provide powerful ways to filter and refine your data. By combining these techniques, you can write more precise and efficient queries to gain better insights from your datasets. Stay tuned for more tutorials and tips on our DAT YouTube channel, and don't forget to check out our blog on Medium for in-depth articles on BigQuery and other data topics. Happy querying!