The WHERE statement in SQL, including BigQuery, allows you to filter rows based on specified conditions. It serves as a powerful tool for extracting relevant data from large datasets.
Let's explore some example statements to illustrate its usage:
Please check Chapter One to see details about the dataset used in these examples.
First, let's run an example statement without WHERE statement.
SELECT name, capacity
FROM 'bigquery-public-data.san_franciso_bikeshare_station_info'
The result will display a list of different station names and their corresponding capacities. Additionally, the total number of rows is 472.
Now, let's add the WHERE clause as capacity > 15
SELECT name, capacity
FROM 'bigquery-public-data.san_franciso_bikeshare_station_info'
WHERE capacity > 15
In this example, we filter rows where the capacity is greater than 15. As a result, only stations with a capacity greater than 15 are displayed. The number of rows has decreased, with a total of 390 rows remaining.
Let's filter our table, where station name = 'Otis St at Brady St'
SELECT name, capacity
FROM 'bigquery-public-data.san_franciso_bikeshare_station_info'
WHERE name = 'Otis St at Brady St'
This result will show table with rows based on the station name 'Otis St at Brady St'
Note: When adding a string WHERE condition, ensure to enclose the value with single quotes, and remember that values are case-sensitive.
If you want to filter your table with multiple distinct values, you can use the 'OR' operator to achieve that.
SELECT name, capacity
FROM 'bigquery-public-data.san_franciso_bikeshare_station_info'
WHERE name = 'Otis St at Brady St'
OR name = 'Filbert St at Fillmore St'
OR name = 'Fillmore St at Jefferson St'
OR name = 'Turk Blvd at Stanyan St'
Instead of using the OR operator, you can achieve the same result more efficiently by using the IN operator. This approach results in a shorter and more manageable statement.
SELECT name, capacity
FROM 'bigquery-public-data.san_franciso_bikeshare_station_info'
WHERE name IN ('Otis St at Brady St', 'Filbert St at Fillmore St', 'Fillmore St at Jefferson St', 'Turk Blvd at Stanyan St')
The IN operator provides a more concise way to specify multiple conditions, achieving the same result as the previous example.
Next, you can also filter your query using dates. For example, if you only want rows where the event happened after April 20, 2028, then you can use this query. Additionally, here we are using a different table inside the bikeshare dataset - bikeshare_trips.
SELECT name, capacity
FROM 'bigquery-public-data.san_franciso_bikeshare_trips'
WHERE start_date > '2018-04-20'
Here, we filter rows based on both date range and minimum product quantity ordered.
SELECT name, capacity
FROM 'bigquery-public-data.san_franciso_bikeshare_trips'
WHERE start_date BTWEEN '2018-04-20' AND '2023-04-30'
AND product_quantity_ordered >= 10;
Mastering data filtering in BigQuery opens up a world of possibilities for data analysis and exploration. By harnessing the power of the WHERE statement and applying best practices, you can efficiently extract valuable insights from your datasets. Whether you're filtering by numeric values, string values, dates, or combining multiple conditions, understanding these techniques will empower you to make informed decisions based on your data.
Additionally, we also have a YouTube tutorial for the same topic, where we showcase more examples of the WHERE statement. If you are a visual learner (or not), then you must check out our YouTube video.
Jump to chapter 3.