When you are using GA4 to track your website or app, looking at sessions and engagement could be very insightful. Without any delay, first, let’s look at how sessions and engagement are defined in GA4 and then how we can extract them to Bigquery.
A session is triggered when one of the following conditions is met:
Note: A session usually lasts for 30 minutes. That means when a user goes to your website and comes back after 31 minutes, then a new session starts.
Sessions can be useful to see, how many users are coming or visiting your platform, but it would be more interesting to see, how many of them are having some sort of ‘engagement’. Then let’s look at how is engagement defined by Google.
In GA4, a session is considered to be engaged, if:
Note: If you think, 10 seconds is too short for an engaged session, then you can manually change the time up to 60 seconds in your GA account.
Now the question remains, how to extract the data for sessions and engaged sessions? If you look at the raw GA4 data, there is no direct answer. So, the trick is to count ga_session_id and user_pseudo_id, by executing the following query:
Short explanation, of what is happening behind the scenes: First I created a subquery named prep (as in preparation). In the prep subquery, I have selected parsed event date and user_pseudo_id.
Next, I have UNNEST the event_params array and select only those values, which have the key = ‘ga_session_id’. These values are stored with ‘int_value’ meaning they are INT64 data types. Here, ga_session_id is a timestamp when a user enters your platform. We can’t simply COUNT ga_session_id and call them a number of sessions and the reason is that, if two users enter your platform at the same time, they both will have the same ga_session_id.
Then for session_id, I first UNNEST event_params and extract integer values with key = ‘ga_session_id’ and then concatenate user_pseudo_id and ga_session_id. By doing so, even in the case of multiple users entering the platform at the same time, they would have a different session_id.
Similarly, for engaged_session and engagement_time_msec, I have UNNEST the event_params column and then only select those keys, which are ‘session_engaged’ and ‘engagement_time_msec’ respectively. Also, note that values of session_engaged are strings, whereas for the engagement_time_msec it is an integer and they are measured in microseconds.
In the end, I have grouped by event_date and then COUNT the DISTINCT session_id for the number of sessions, and when there is an engagement session I have again COUNT session_id and taken the averaged engaged time in microseconds.
With such a query, one can get information on how many sessions were triggered and how many were engaged in those sessions, and also what was their average time for the engagement. In the upcoming post, I’ll write down more queries on how to extract data from raw GA4.
Follow us on LinkedIn for insights into our daily work and important updates on BigQuery, Data Studio, and marketing analytics.
Subscribe to our YouTube channel for discussions on DWH, BigQuery, Looker Studio, and Google Tag Manager.
If you are interested in learning BigQuery from scratch, get access to our free BigQuery Course
Elevate your skills with Google Data Studio and BigQuery by enrolling in our Udemy course.
Need help setting up a modern, cost-efficient data warehouse or analytical dashboard? Email us at hello@datadice.io to schedule a call.