After Google announced shutting down Universal Analytics (UA) a lot of users are bound to shift from UA to GA4. With this new change, it is essential to look at the data structure and how to extract important metrics from raw GA4. In this blog, I would like to introduce how to count event sessions, in a given calendar week, date, and hour in BigQuery.
We have already published an intro blog on how to query the data with static, dynamic, and hybrid dates. Please have a look at the blog from Vaishnavee to refresh your memory. With that in mind, let’s first look at the “Preview” of the data, before starting the query itself. The data is from a website that has been integrated with GA4 and the raw data is already imported to BQ.
For this blog, let's focus on the following columns: event_date and event_name. Each row is created when a user interacts with your website or app. For example, first_visit is triggered when a new user visits the website or launches the app for the first time. Let's then look at the first visits to our website on different days and times. We are aiming for a table that looks like this:
Assuming that, it is required to fetch data for this current running month-to-date (May 2022). The following query is valid:
SELECT *
FROM `PROJECT_NAME.analytics_PROPERTY_ID.events_*`
WHERE TRUE
AND _TABLE_SUFFIX BETWEEN '20220501' AND FORMAT_DATE('%Y%m%d', DATE_SUB( CURRENT_DATE(), INTERVAL 1 DAY))
Since we want to look at first_visits only, we will filter out data that has event_name = ‘first_visit’ and get the date from the event_date column. With the date column, it is possible to get the calendar week number. Here I have provided you with a special calendar week format that we use in datadice. If you have other approaches to format date to the responding week, let us know. Then we group it by calendar week and date and count the event_name. The following updated query will provide the result we want:
SELECT
CONCAT(FORMAT_DATE('%Y CW', PARSE_DATE('%Y%m%d', event_date)),
LPAD(CAST(CAST(FORMAT_DATE('%U', PARSE_DATE('%Y%m%d', event_date)) AS INT64)+1 AS STRING),2,'0')) AS calender_week,
PARSE_DATE('%Y%m%d', event_date) AS date,
COUNT(event_name) AS first_visits,
FROM `PROJECT_NAME.analytics_PROPERTY_ID.events_*`
WHERE TRUE
AND event_name = 'first_visit'
AND _TABLE_SUFFIX BETWEEN '20220501' AND FORMAT_DATE('%Y%m%d', DATE_SUB( CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY calendar_week, date
ORDER BY calendar_week, date
Now the remaining one is only the hour column, the time is stored in event_timestamp but in micros. In BQ, it is possible to change from micros to an hour-minute-second format by using TIMESTAMP_MICROS(event_timestamp).
Now using a CASE-WHEN statement, we will divide the hours into 4 groups: 00-06, 07-12, 13-18, and 19-23.
SELECT
CONCAT(FORMAT_DATE('%Y CW', PARSE_DATE('%Y%m%d', event_date)),
LPAD(CAST(CAST(FORMAT_DATE('%U', PARSE_DATE('%Y%m%d', event_date)) AS INT64)+1 AS STRING),2,'0')) AS calender_week,
PARSE_DATE('%Y%m%d', event_date) AS date,
CASE
WHEN EXTRACT(HOUR FROM TIMESTAMP_MICROS(event_timestamp)) BETWEEN 0 AND 6
THEN '00-06'
WHEN EXTRACT(HOUR FROM TIMESTAMP_MICROS(event_timestamp)) BETWEEN 7 AND 12
THEN '07-12'
WHEN EXTRACT(HOUR FROM TIMESTAMP_MICROS(event_timestamp)) BETWEEN 13 AND 18
THEN '13-18'
ELSE '19-23'
END AS hour,
COUNT(event_name) AS first_visits,
FROM `PROJECT_NAME.analytics_PROPERTY_ID.events_*`
WHERE TRUE
AND event_name = 'first_visit'
AND _TABLE_SUFFIX BETWEEN '20220501' AND FORMAT_DATE('%Y%m%d', DATE_SUB(
CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY calendar_week, date
ORDER BY calendar_week, date
With this query, you are able to count the number of first visits sub-grouped by calendar week, date, and hours.
This post is a part of the “Raw GA4 data in BigQuery” series to get some quick insights into the user behavior on your website or app. I will end this session here for now, but in the upcoming posts, I will try to provide more queries to analyze data from GA4 and build up more complex queries in the upcoming future.