By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
Jun 13, 2024
More

How to set up scheduled queries in BigQuery

Automate and Optimize Your Data Processing with Release and Workflow Configurations. By Fassil S. Yehuala

Understanding the lifecycle of code in Dataform is essential for efficient data transformation. By configuring compilation and execution settings, you can customize Dataform to meet your specific requirements. Whether it's managing development workspaces, scheduling workflow executions, or defining compilation overrides, Dataform provides a comprehensive platform for effectively managing the code lifecycle.

In our previous articles of the "Creating Data Pipeline with Dataform in BigQuery" series, I covered the fundamentals of Dataform and its integration with GCP's BigQuery. In this installment, I will guide you through the process of creating release configurations and workflow configurations in Dataform for scheduled execution, empowering you to automate and optimize your data processing. 

Let's dive in!

Creating a Release Configuration in Dataform

Release Configurations in Dataform offer a powerful mechanism for configuring and executing SQL workflows on a schedule. By creating release configurations, you can define compilation settings and schedule the frequency of generating compilation results. Let's explore the steps involved in creating release configurations.

What are Release Configurations?

Before we proceed, let's gain a clear understanding of release configurations. When you develop SQL workflows in Dataform, the SQL code undergoes compilation, resulting in a compilation output. Release configurations provide the flexibility to customize compilation settings and establish execution environments, such as staging or production.

Creating a Release Configuration

Now, let's transition our focus towards the creation of the release configuration. This stage involves setting up the necessary parameters and specifications for the release process. We will define the steps and requirements to ensure a smooth and successful scheduled execution.

To set up a release configuration in Dataform, you can do the following steps:

  1. Navigate to the Release Configurations section in your repository.
  2. Click on the "New release configuration" button.
  1. In the "Create release configuration" pane, configure the release settings:
  • Enter a unique ID for the release configuration in the "Release ID" field.
  • Specify the Git branch or commit SHA for the release in the "Git commitish" field. If your Dataform repository is not connected to a remote repository, the value is typically "main".
  • Select the frequency of creating compilation results from the "Frequency" drop-down.
  1. If you want to override the default settings defined in dataform.json for all workspaces in your repository, configure the compilation overrides:
  • Enter the Google Cloud project ID where you want to store the compilation result.
  • Specify a suffix to append to the schema configured in dataform.json.
  • Enter a prefix for all table names.
  1. Optionally, you can set compilation variables by clicking on the "Add variable" button and providing the key-value pairs. 
  2. Click the "Create" button to create the release configuration.

Congratulations! You have successfully created a release configuration in Dataform, which allows you to configure and execute SQL workflows on a schedule. By defining compilation settings and scheduling the frequency of creating compilation results, you can automate your workflow execution.

Scheduling SQL Workflow Executions with Workflow Configurations

Dataform provides a powerful feature called workflow configurations, enabling you to schedule and configure the execution of SQL workflows. With workflow configurations, you can automate the execution of Dataform SQL workflows on a schedule of your choice. Let's explore how to create and manage workflow configurations.

What are Workflow Configurations?

Workflow configurations in Dataform allow you to schedule the execution of SQL workflow actions. By creating a workflow configuration, you can select a compilation release configuration, choose SQL workflow actions, and define the execution schedule.

Creating a Workflow Configuration

Now, let's move on to the next step of our journey and create a Workflow Configuration in Dataform. This configuration will play a vital role in organizing and managing the execution of our SQL workflows. By setting up a Workflow configuration, we can automate and optimize the execution of our Dataform SQL workflows according to our desired schedule. 

Let's dive into the process and create a Workflow configuration that suits our needs.

  1. Go to the Workflow Configurations section in your repository.
  2. Click on the "New workflow configuration" button.
  1. In the "Create workflow configuration" pane, provide the necessary details.

  • Configuration ID: Enter a unique ID for the workflow configuration.
  • Release configuration: Select a compilation release configuration from the drop-down menu.
  • Frequency: Specify the frequency of executions in the Unix-cron format. Ensure a minimum 1-hour break between compilation result creation and the scheduled execution time.
  • Timezone: Select the desired time zone for executions (default is UTC).
  • Select SQL workflow actions to be executed based on your requirements:
  • To execute the entire SQL workflow, choose "All actions."
  • To execute specific actions, select "Selection of actions" and choose the desired actions from the list.
  • To execute actions based on tags, select "Selection of tags" and choose the relevant tags.
  • Choose additional options as needed:
  • Include dependencies: Execute selected actions or tags along with their dependencies.
  • Include dependents: Execute selected actions or tags along with their dependents.
  • Run with full refresh: Rebuild all tables from scratch during execution (default is updating incremental tables without rebuilding).
  1. Click the "Create" button to create the workflow configuration.

Differences between Release and Workflow Configurations

While both release configurations and workflow configurations in Dataform serve the purpose of managing the execution of SQL workflows, they have distinct differences in terms of their functionality and use cases.

In short, release configurations are primarily concerned with compilation settings and scheduling the frequency of compilation, while workflow configurations focus on scheduling the execution of SQL workflow actions, managing dependencies, and specifying the execution schedule. By using these configurations together, you can automate your data processing pipelines effectively.

Order of Scheduling

As discussed earlier, the release configuration determines the compilation settings and generates the compilation results for the SQL workflows. By scheduling the release configuration first, you establish a solid foundation for the subsequent workflow configurations to rely on. The workflow configuration specifies the execution schedule and defines which SQL workflow actions to execute. It relies on the compilation results generated by the release configuration to execute the SQL workflows correctly. 

Here's an example to illustrate the scheduling of release and workflow configurations:

In this example, the release configuration with ID "DailyCompilation" is scheduled to run every day at 8:00 AM in the UTC timezone. The compilation results are generated during this process. 

The workflow configuration with ID "DailyWorkflow" is scheduled to run every day at 9:00 AM in the UTC timezone. This process relies on the compilation results produced by the release configuration and executes the specified SQL workflow actions based on the schedule.

By following this sequence, the compilation results are available before the workflow execution starts, ensuring that the workflow uses the most up-to-date compiled code for its actions.

Conclusion

This article explored how to create release and workflow configurations in Dataform for scheduled execution. Release configurations allow you to customize compilation settings and create execution environments, while workflow configurations enable you to organize and automate SQL workflow executions. By leveraging these features, you can streamline your workflow execution and maximize productivity.

Further Links

Follow us on LinkedIn for insights into our daily work and important updates on BigQuery, Data Studio, and marketing analytics.

Subscribe to our YouTube channel for discussions on DWH, BigQuery, Looker Studio, and Google Tag Manager.

If you are interested in learning BigQuery from scratch, get access to our free BigQuery Course

Elevate your skills with Google Data Studio and BigQuery by enrolling in our Udemy course.

Need help setting up a modern, cost-efficient data warehouse or analytical dashboard? Email us at hello@datadice.io to schedule a call.