Google Cloud recently introduced a new feature to simplify data management: BigQuery Workflows. Now available in preview, this code-free orchestration tool is designed to help data teams automate their processes with less hassle. Whether running regular queries or maintaining a pipeline of tasks, BigQuery Workflows offers a simple, visual way to get things done.
In this post, we’ll break down what BigQuery Workflows is, how to create and use it, the pros and cons, and how it fits in alongside Dataform, another popular orchestration tool.
BigQuery Workflows is a feature within Google Cloud's BigQuery platform that allows users to automate tasks like running SQL queries or notebooks in sequence. If you’ve ever struggled with manually scheduling queries or finding the right tool to manage a data pipeline, this is where BigQuery Workflows comes in.
The tool provides an easy way to schedule tasks, handle dependencies between them, and monitor progress - all without needing to write code. You can use it to ensure that your SQL queries or notebooks run on time, every time, without worrying about what comes first or whether things will finish before the next task kicks off.
Creating a workflow in BigQuery is straightforward, even for non-technical users. Here’s a basic guide to help you get started:
Let’s walk through an example of how BigQuery Workflows can be used to orchestrate a sequence of tasks. Suppose you want to process transactional data and then perform machine learning on the aggregated results to predict customer behavior. Here's how you could set up a workflow that orchestrates SQL queries and a Python notebook.
First, create a table that contains dummy transactional data:
This creates a customer_summary table that contains each customer's total transactions and revenue.
This Python notebook takes the customer_summary data as input and uses a linear regression model to predict customer revenue based on the number of transactions. The results will be stored in the Cloud Storage bucket tied to the workflow.
Using BigQuery Workflows, you can schedule these steps in sequence:
This orchestration ensures that data is processed and analyzed efficiently without manual intervention. You can run the workflow on a schedule (e.g., daily or weekly) to continuously update your predictions as new transactions are processed.
In the new Orchestration menu in BigQuery, you can see the workflow executions next to the Dataform executions.
There you can click on the corresponding Workflows to look into the details and if they ran successfully the last 5 times.
As promising as BigQuery Workflows is, there are some limitations:
BigQuery Workflows is built on Dataform, a tool already used by many data teams for managing complex data transformations. However, while Dataform requires coding knowledge and is more suited for advanced users, BigQuery Workflows aims to be a simpler, no-code option that anyone can use.
So, why introduce BigQuery Workflows in addition to Dataform? The answer lies in accessibility. Dataform is excellent for complex projects where teams need full control over their data pipelines and advanced customizations. However, not every team needs that level of complexity. For day-to-day operations like running scheduled queries or orchestrating basic tasks, BigQuery Workflows offers a much easier solution. It strikes a balance between usability and power, making it ideal for businesses that need a simple, visual tool but still want to automate processes efficiently.
In the future, there’s even potential for BigQuery Workflows to expand to other types of assets or even integrate with Composer for more complex workflows.
BigQuery Workflows is a valuable addition to the BigQuery platform, especially for users looking for a simple way to manage and automate their data pipelines. With its intuitive interface, built-in scheduling, and seamless integration with BigQuery, it’s an easy-to-use tool that addresses common pain points in data orchestration.
Whether you're a data analyst managing regular reports or an engineer automating a machine learning pipeline, BigQuery Workflows can help you streamline your operations with less effort. As Google continues to develop this tool, we can expect even more features and integrations to make it even more powerful.
If you’re already using BigQuery, now’s a great time to explore this new feature and see how it can simplify your data processes.
If you enjoyed this blog, you'll love these too! Dive into more captivating content:
Upgrade Your ls Command to eza
Latest Updates on Google Data Analytics
Your AI Companion in the Google Cloud
Awakening the Data Messenger by Integrating BigQuery with Slack
Check out our LinkedIn account, to get insights into our daily working life and get important updates about BigQuery, Data Studio, and marketing analytics.
We also started with our own YouTube channel. We talk about important DWH, BigQuery, Data Studio, and many more topics. Check out the channel here.
If you want to learn more about using Google Data Studio and taking it to the next level in combination with BigQuery, check out our Udemy course here.
If you are looking for help setting up a modern and cost-efficient data warehouse or analytical dashboard, email us at hello@datadice.io and we will schedule a call.