BigQuery is a Data Warehouse tool to store and transforms big amounts of data. One of the biggest differences to an operational database is, that you can have intentionally duplicate information in different tables and no needed relations between the tables.
Google added a full feature set of Key constraints to BigQuery now. You can set keys during the table creation process, view the current constraints in a dataset, and many more.
Everybody who is interested in the new BigQuery feature has to read this post.
Primary Keys and Foreign Keys are usually one of the first things you learn when you start with SQL or databases.
Primary Keys:
Foreign Keys:
To create a table with a Primary Key you have to do the following:
CREATE TABLE key_constraint_tests.planet (
sun_id INT64,
position INT64,
name STRING,
number_of_mons INT64,
PRIMARY KEY (sun_id, position) NOT ENFORCED
);
So due to no enforcement, it is not a problem to add the following rows to the table:
INSERT key_constraint_tests.planet (sun_id, position, name, number_of_mons)
VALUES(1, 1, 'mercury', 0),
(1, 1, 'venus', 0),
(1, 3, 'earth', 1),
(1, 4, 'mars', 2)
In the data is an error and Mercury and Venus get both position 1. So we have the same combination twice inside of a Primary Key.
Creating a table with foreign Keys works kind of the same as for the Primary Key:
CREATE TABLE key_constraint_tests.atmospheres (
sun_id INT64,
position INT64,
element_1 STRING,
element_2 STRING,
element_3 STRING,
FOREIGN KEY(sun_id, position) REFERENCES key_constraint_tests.planet (sun_id, position) NOT ENFORCED
);
Then we can also add the following lines to the table:
INSERT key_constraint_tests.atmospheres (sun_id, position, element_1, element_2, element_3)
VALUES
(1, 1, 'Hydrogen', 'Oxid', 'Helium'),
(1, 2, 'Carbon dioxide', 'Nitrogen', 'Sulfur dioxide'),
(1, 5, 'Hydrogen', 'Helium', 'Methane')
The 3rd insert is working too. There is no 5th planet in the original table, but due to the no enforcement setting, there is no error.
You can also easily add Primary or Foreign keys to tables later.
With the ALTER TABLE statement, you can add a Primary Key:
ALTER TABLE key_constraint_tests.planet_without_pk
ADD PRIMARY KEY(sun_id, position_from_sun) NOT ENFORCED;
Or you add a Foreign Key:
ALTER TABLE key_constraint_tests.atmospheres_without_fk
ADD FOREIGN KEY(sun_id, position_from_sun) REFERENCES key_constraint_tests.planet (sun_id, position_from_sun) NOT ENFORCED
When you have your first tables with the constraints in a dataset, you can get an overview of the current situation with a new metadata view.
The view is the table_constraints view. You can look at the results with the following command:
SELECT *
FROM key_constraint_tests.INFORMATION_SCHEMA.TABLE_CONSTRAINTS
You just have to change the dataset name in the query above and you will get a valid result.
The information you get is quite basic. One row is one constraint.
A full list you can find here.
Google added a functionality to BigQuery, which is already available in operational databases like MySQL, PostgreSQL, … . Primary and Foreign Keys are super important in these systems.
The current status in BigQuery is just a first version. A lot of features are still missing to give a comprehensive set of features to the analysts and engineers. The most important missing features are:
In a summary, it is a good and easy starting point, but to use it in projects, a lot of features are still missing. Currently, it is more for showing the references between tables.
This post is part of the new features in the Google cloud series from datadice and gives you detailed insights about a new feature in BigQuery or Data Studio.
Follow us on LinkedIn for insights into our daily work and important updates on BigQuery, Data Studio, and marketing analytics.
Subscribe to our YouTube channel for discussions on DWH, BigQuery, Looker Studio, and Google Tag Manager.
If you are interested in learning BigQuery from scratch, get access to our free BigQuery Course
Elevate your skills with Google Data Studio and BigQuery by enrolling in our Udemy course.
Need help setting up a modern, cost-efficient data warehouse or analytical dashboard? Email us at hello@datadice.io to schedule a call.