Understanding BigQuery HAVING Statement

July 24, 2024
Chapter 7: Understanding BigQuery HAVING Statement

Hello and welcome to our blog on mastering the HAVING statement in BigQuery! If you're familiar with SQL, you likely know the WHERE clause, but today, we dive into its powerful counterpart for aggregated data: the HAVING statement. This tutorial will guide you through understanding the HAVING statement, its differences from the WHERE clause, and provide practical examples to solidify your learning. If you’re new to SQL or need a refresher on GROUP BY and aggregation functionalities, be sure to check out our previous tutorials as they lay the foundation for this lesson.

Understanding the HAVING Statement

The HAVING statement is essential for filtering results after data has been aggregated. This is particularly useful when you want to impose conditions on aggregated data. Let’s break down its functionality and illustrate with examples.

1. Difference Between WHERE and HAVING

  1. WHERE Clause: Filters rows before any groupings are made.
  2. HAVING Clause: Filters groups after the GROUP BY operation has been performed.

Let's start with a simple query:

SELECT 
      start_station_name,
      SUM(duration_sec) AS total_duration,
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips`
GROUP BY 1
ORDER BY 2 DESC

This query aggregates the data from the bikeshare_trips table by start_station_name, summing up the total duration of trips for each station. It groups the results by start_station_name and orders them in descending order based on the total_duration.

In the result:

  • Harry Bridges Plaza appears at the top of the list, indicating it has the highest total trip duration among all stations.
  • Conversely, 5th St at E. San Salvador St appears lower in the list, showing it has the lowest total duration among the top entries.

2. Practical Example: Filtering Start Station Names

To analyze specific sets of data or exclude unusual entries, you need to filter based on aggregated results. For instance, if you want to only show stations with a total duration above 500,000 seconds, you might think of using a WHERE clause like this:

SELECT 
      start_station_name,
      SUM(duration_sec) AS total_duration,
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips`
WHERE total_duration > 500000
GROUP BY 1
ORDER BY 2 DESC

This query attempts to use WHERE to filter results based on total_duration, which is an aggregated value. However, this will result in an error because the WHERE clause is applied before the aggregation (i.e., before total_duration is computed). The WHERE clause does not recognize total_duration since it does not exist until after the GROUP BY operation.

Instead of using WHERE, you should use the HAVING clause to filter results after aggregation:

SELECT 
      start_station_name,
      SUM(duration_sec) AS total_duration,
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips`
GROUP BY 1
HAVING total_duration > 500000
ORDER BY 2 DESC

In this query:

  • We first group the data by start_station_name and calculate total_duration using SUM(duration_sec).
  • The HAVING clause is then used to filter the grouped results to include only those stations where total_duration exceeds 500,000 seconds.
  • Finally, the results are ordered by total_duration in descending order.
Query using HAVING Clause


SELECT 
      start_station_name,
      SUM(duration_sec) AS total_duration,
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips`
GROUP BY 1
HAVING total_duration > 50000000
ORDER BY 2 DESC

SELECT 
      start_station_name,
      SUM(duration_sec) AS total_duration,
FROM `bigquery-public-data.san_francisco_bikeshare.bikeshare_trips`
GROUP BY 1
HAVING total_duration BETWEEN 50000 AND 80000
ORDER BY 2 DESC

Conclusion

The HAVING statement is a powerful tool in SQL for filtering aggregated data, offering functionalities that the WHERE clause cannot. By leveraging HAVING, you can apply complex conditions to your grouped data and achieve more precise results in your queries. Practice using the examples provided to enhance your understanding and visit our [website](#) for more tutorials and resources. Don't forget to like this blog, subscribe to our YouTube channel, and stay tuned for more SQL tips and tricks. Happy querying!

Subscribe to our newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.