Understanding SQL Starting with SELECT * FROM

July 19, 2024
Chapter 1: Understanding SQL Starting with SELECT * FROM

In this BigQuery tutorial for beginners, you'll dive into the fundamentals of SQL, focusing on the SELECT and FROM statements. Starting with an overview of BigQuery, including project structures and datasets, you'll learn how to effectively query data using SQL. Through practical examples and demonstrations, you'll understand how to retrieve specific data from tables, utilize SELECT DISTINCT for unique values, and navigate potential pitfalls like using SELECT * efficiently. By the end of this tutorial, you'll have a solid foundation in SQL querying within the BigQuery environment.

Introduction to BigQuery and Dataset Organization

Before diving into SQL queries, let's familiarize ourselves with BigQuery's structure. BigQuery organizes data into projects, datasets, and tables. Each project can contain multiple datasets, and each dataset can contain multiple tables.For our tutorials, we'll primarily focus on the dwh_udemy_course dataset and the Big Public Data, which contains publicly available data. Within the dwh_udemy_course dataset, we'll explore the dwh_ecommerce dataset, comprising tables such as order item, product, and customer data. And for the BigQuery public dataset, we will be using the San Francisco Bike Share data.

BigQuery Project Space

1. SELECT FROM

Let's write our first query. The most basic query that you can write is SELECT * FROM. Let's take an example from the Bike Share data. The breakdown of this query will be as follows: This statement will select a column called "start_station_name" from the table bikeshare_trips which is located inside bigquery-public-dataset, inside the San Francisco Bike Share dataset.

Once you write the query, check on the top right corner, if there is a green check mark ✅,  it means the query is correct. Also, another important thing to notice is how many MBs of data will be processed when running the query. This MBs of data determines the cost of running the query. To learn more details about the cost of the query, check out this link.

SELECT start_station_name
FROM bigquery-public-data.san_francisco_bikeshare_trips

BIgQuery Statement basic structure

2. SELECT DISTINCT FROM STATEMENT

When running the query without using the DISTINCT function, all rows are shown in the result, including duplicates. For example, if there are multiple trips starting from the same station, each trip will be displayed as a separate row in the result.

However, if we use the DISTINCT function, the result will only show unique rows. Duplicate rows will be removed from the result set. For instance, if there are multiple trips starting from the same station, only one instance of that station will be displayed in the result, eliminating redundancy and providing a cleaner output. An example illustrating both scenarios has been illustrated in the image below.

SELECT DISTINCT start_sattion_name
FROM 'bigquery-public-data.san_francisco_bikeshare.bikeshare_trips'

Difference of Numbers of rows between using with DISTINCT and without DISTINCT

3. SELECT * FROM STATEMENT

When using the SELECT * FROM statement without any additional functions, all columns and rows from the specified table are retrieved in the query result. This means that every piece of data in the table will be displayed, including duplicates if they exist. "SELECT *"  is equivalent to "SELECT ALL"

SELECT *
FROM 'bigquery-public-data.san_francisco_bikeshare.bikeshare_trips'

Result using SELECT * FROM

4. SELECT AS STATEMENT

To enhance clarity and consistency in our queries, we can rename columns and tables using aliases. Aliases provide descriptive names for columns and tables, making it easier to understand query results, especially when dealing with multiple tables and JOIN operations. This time let's try an exmaple with a different table - bikeshare_station_info.

SELECT name AS station_name
FROM 'bigquery-public-data.san_francisco_bikeshare_station_info'

Result using SELECT column_name FROM

You can also write the statement without AS and it will function the same.

SELECT name station_name
FROM 'bigquery-public-data.san_francisco_bikeshare_station_info'

Similarly, you can also add aliases for tables. For example,

SELECT name station_name
FROM 'bigquery-public-data.san_francisco_bikeshare_station_info' AS station

Conclusion

In conclusion, mastering the SELECT * FROM statement is essential for anyone working with SQL queries in BigQuery. By understanding BigQuery's structure, utilizing keywords like DISTINCT, and optimizing queries for cost efficiency, you can efficiently extract valuable insights from your datasets.

Additionally, for visual learners, we have a comprehensive YouTube video on this topic. In the video, we also showcase examples of statements from the dwh_udemy_course dataset, providing a practical demonstration of how queries are constructed and executed. If you're interested in seeing these queries in action, check out our YouTube video.

Furthermore, in the next chapter, we delve into the WHERE statement in detail, exploring how it can be used to filter data and refine query results.

Jump to chapter 2.

Subscribe to our newsletter
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.