By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Introducing BigQuery Workflows

A New Era of Data Orchestration. By Fassil S. Yehuala

Google Cloud recently introduced a new feature to simplify data management: BigQuery Workflows. Now available in preview, this code-free orchestration tool is designed to help data teams automate their processes with less hassle. Whether running regular queries or maintaining a pipeline of tasks, BigQuery Workflows offers a simple, visual way to get things done.

In this post, we’ll break down what BigQuery Workflows is, how to create and use it, the pros and cons, and how it fits in alongside Dataform, another popular orchestration tool.

What are BigQuery Workflows?

BigQuery Workflows is a feature within Google Cloud's BigQuery platform that allows users to automate tasks like running SQL queries or notebooks in sequence. If you’ve ever struggled with manually scheduling queries or finding the right tool to manage a data pipeline, this is where BigQuery Workflows comes in.

The tool provides an easy way to schedule tasks, handle dependencies between them, and monitor progress - all without needing to write code. You can use it to ensure that your SQL queries or notebooks run on time, every time, without worrying about what comes first or whether things will finish before the next task kicks off.

How to Create Workflows in BigQuery

Creating a workflow in BigQuery is straightforward, even for non-technical users. Here’s a basic guide to help you get started:

  1. Go to BigQuery: Go to the BigQuery interface in the Google Cloud Console. In the editor pane, select the “Workflow” option from the drop-down menu.
  1. Configure Your Workflow: Name your workflow, pick a Dataform service account to run it, and choose the region where it should be processed.
  1. Add Tasks: You can add SQL queries or notebooks to the workflow. Each task can depend on others, allowing you to define an execution order. For example, one task might clean up data, and the next might load that data into a machine-learning model.
  1. Deploy and Schedule: Once your workflow is set up, you can deploy it and schedule it to run automatically at whatever intervals you choose - daily, weekly, or any other frequency that works for you.
  1. Monitor and Adjust: After deployment, you can monitor your tasks from the BigQuery console. If something goes wrong, like a permission issue, you’ll see it in the logs, making troubleshooting easy.

Example: Orchestrating a Simple Workflow in BigQuery

Let’s walk through an example of how BigQuery Workflows can be used to orchestrate a sequence of tasks. Suppose you want to process transactional data and then perform machine learning on the aggregated results to predict customer behavior. Here's how you could set up a workflow that orchestrates SQL queries and a Python notebook.

Task 1: Create and Aggregate Transactional Data Using SQL

First, create a table that contains dummy transactional data:

Task 2: aggregate the data by customer

This creates a customer_summary table that contains each customer's total transactions and revenue.

Task 3: Run a Machine Learning Model in a Python Notebook

This Python notebook takes the customer_summary data as input and uses a linear regression model to predict customer revenue based on the number of transactions. The results will be stored in the Cloud Storage bucket tied to the workflow.

Using BigQuery Workflows, you can schedule these steps in sequence:

  1. SQL Query: First, create and aggregate the transaction data.
  2. Python Notebook: After the SQL tasks are complete, run the notebook to perform machine learning predictions.

This orchestration ensures that data is processed and analyzed efficiently without manual intervention. You can run the workflow on a schedule (e.g., daily or weekly) to continuously update your predictions as new transactions are processed.

Observe Workflows in the Orchestration

In the new Orchestration menu in BigQuery, you can see the workflow executions next to the Dataform executions.

There you can click on the corresponding Workflows to look into the details and if they ran successfully the last 5 times.

The Advantages of BigQuery Workflows

  1. Visual Interface: One of the biggest perks of BigQuery Workflows is its visual interface. Instead of digging into code, you can manage your data pipelines by dragging and dropping tasks. It’s especially helpful for seeing the dependencies between tasks at a glance.
  2. No More Guessing Timelines: Previously, you might have had to estimate how long a query would take before scheduling the next one. With BigQuery Workflows, you can set dependencies between tasks, so one runs only after the previous task is done. This ensures everything runs smoothly without delays.
  3. Built-in Scheduling: The scheduling engine, powered by Dataform, handles the timing of tasks automatically. Whether you’re running a single query or orchestrating a multi-step pipeline, everything will execute in the correct order and on time.
  4. Centralized Monitoring: Workflows come with built-in logs for tracking progress. You can easily monitor tasks, troubleshoot errors, and even see the output from your notebooks in Cloud Storage.
  5. Cost-Effective: BigQuery Workflows doesn’t cost extra. You pay only for the usual compute and storage costs of BigQuery, making it a low-cost addition to your data toolset.

The Disadvantages of BigQuery Workflows

As promising as BigQuery Workflows is, there are some limitations:

  • New Assets Only: You can’t import existing queries or notebooks into a workflow. Every task has to be created from scratch within the workflow interface.
  • No Workflow Sharing: Unlike other tools, you can’t easily share a specific workflow with other users. Only those with the Dataform Admin role can access and modify workflows, which might slow down collaboration in larger teams.
  • Fixed Region: When creating a workflow, you have to select a processing region, and this choice is permanent. If your data storage needs change, you’ll have to create a new workflow from scratch.
  • No Code Versioning: There is no repository saving the code of the Workflow, which leads to no centralized place for the implementation and no advanced saving mechanisms like commits and no revert of changes.
  • Preview Mode: Since this is a preview feature, some bugs or limitations in support are expected. It also means that future functionality is likely to expand, but the current tool may feel limited compared to mature alternatives like Composer.

How do BigQuery Workflows Compare to Dataform?

BigQuery Workflows is built on Dataform, a tool already used by many data teams for managing complex data transformations. However, while Dataform requires coding knowledge and is more suited for advanced users, BigQuery Workflows aims to be a simpler, no-code option that anyone can use.

So, why introduce BigQuery Workflows in addition to Dataform? The answer lies in accessibility. Dataform is excellent for complex projects where teams need full control over their data pipelines and advanced customizations. However, not every team needs that level of complexity. For day-to-day operations like running scheduled queries or orchestrating basic tasks, BigQuery Workflows offers a much easier solution. It strikes a balance between usability and power, making it ideal for businesses that need a simple, visual tool but still want to automate processes efficiently.

In the future, there’s even potential for BigQuery Workflows to expand to other types of assets or even integrate with Composer for more complex workflows.

Conclusion

BigQuery Workflows is a valuable addition to the BigQuery platform, especially for users looking for a simple way to manage and automate their data pipelines. With its intuitive interface, built-in scheduling, and seamless integration with BigQuery, it’s an easy-to-use tool that addresses common pain points in data orchestration.

Whether you're a data analyst managing regular reports or an engineer automating a machine learning pipeline, BigQuery Workflows can help you streamline your operations with less effort. As Google continues to develop this tool, we can expect even more features and integrations to make it even more powerful.

If you’re already using BigQuery, now’s a great time to explore this new feature and see how it can simplify your data processes.

More to read

If you enjoyed this blog, you'll love these too! Dive into more captivating content:

Upgrade Your ls Command to eza

Latest Updates on Google Data Analytics

Your AI Companion in the Google Cloud

Awakening the Data Messenger by Integrating BigQuery with Slack

Further Links

Check out our LinkedIn account, to get insights into our daily working life and get important updates about BigQuery, Data Studio, and marketing analytics.

We also started with our own YouTube channel. We talk about important DWH, BigQuery, Data Studio, and many more topics. Check out the channel here.

If you want to learn more about using Google Data Studio and taking it to the next level in combination with BigQuery, check out our Udemy course here.

If you are looking for help setting up a modern and cost-efficient data warehouse or analytical dashboard, email us at hello@datadice.io and we will schedule a call.