By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.
Jun 13, 2024
Google Analytics

Extract Sessions & Engagement from GA4 with BigQuery

Use GA4 raw data in BigQuery. By Sanu Maharjan

When you are using GA4 to track your website or app, looking at sessions and engagement could be very insightful. Without any delay, first, let’s look at how sessions and engagement are defined in GA4 and then how we can extract them to Bigquery.

Sessions

A session is triggered when one of the following conditions is met:

  • User opens your website or app
  • User views a page or screen (given that no other sessions are active)

Note: A session usually lasts for 30 minutes. That means when a user goes to your website and comes back after 31 minutes, then a new session starts.

Sessions can be useful to see, how many users are coming or visiting your platform, but it would be more interesting to see, how many of them are having some sort of ‘engagement’. Then let’s look at how is engagement defined by Google.

Engaged Sessions

In GA4, a session is considered to be engaged, if:

  • There are 2 or more page_views OR
  • There are 1 or more conversion events OR
  • The user lasts more than 10 seconds

Note: If you think, 10 seconds is too short for an engaged session, then you can manually change the time up to 60 seconds in your GA account.

Now the question remains, how to extract the data for sessions and engaged sessions? If you look at the raw GA4 data, there is no direct answer. So, the trick is to count ga_session_id and user_pseudo_id, by executing the following query:

  
WITH prep AS(
  SELECT
    PARSE_DATE('%Y%m%d', event_date) AS event_date,
    user_pseudo_id,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id') AS ga_session_id,
    CONCAT(user_pseudo_id, (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) AS session_id,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'session_engaged') AS engaged_sessions,
    (SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'engagement_time_msec') AS engagement_time_msec,

  FROM `PROJECT_NAME.analytics_PROPERTY_ID.events_*`
  
  WHERE TRUE
    AND _TABLE_SUFFIX BETWEEN '20220601' AND FORMAT_DATE('%Y%m%d', DATE_SUB( CURRENT_DATE(), INTERVAL 1 DAY))
)

SELECT
  event_date,

  COUNT(DISTINCT session_id) AS num_of_sessions,
  COUNT(DISTINCT 
    CASE
      WHEN engaged_sessions = '1' THEN CONCAT(user_pseudo_id, ga_session_id) ELSE NULL
    END) AS num_of_engaged_sessions,
  AVG(engagement_time_msec) AS engagement_time_msec

FROM prep

GROUP BY event_date

ORDER BY event_date DESC

 

Short explanation, of what is happening behind the scenes: First I created a subquery named prep (as in preparation). In the prep subquery, I have selected parsed event date and user_pseudo_id.

Next, I have UNNEST the event_params array and select only those values, which have the key = ‘ga_session_id’. These values are stored with ‘int_value’ meaning they are INT64 data types. Here, ga_session_id is a timestamp when a user enters your platform. We can’t simply COUNT ga_session_id and call them a number of sessions and the reason is that, if two users enter your platform at the same time, they both will have the same ga_session_id.

Then for session_id, I first UNNEST event_params and extract integer values with key = ‘ga_session_id’ and then concatenate user_pseudo_id and ga_session_id. By doing so, even in the case of multiple users entering the platform at the same time, they would have a different session_id.

Similarly, for engaged_session and engagement_time_msec, I have UNNEST the event_params column and then only select those keys, which are ‘session_engaged’ and ‘engagement_time_msec’ respectively. Also, note that values of session_engaged are strings, whereas for the engagement_time_msec it is an integer and they are measured in microseconds.

In the end, I have grouped by event_date and then COUNT the DISTINCT session_id for the number of sessions, and when there is an engagement session I have again COUNT session_id and taken the averaged engaged time in microseconds.

Conclusion

With such a query, one can get information on how many sessions were triggered and how many were engaged in those sessions, and also what was their average time for the engagement. In the upcoming post, I’ll write down more queries on how to extract data from raw GA4.

Further Links

Follow us on LinkedIn for insights into our daily work and important updates on BigQuery, Data Studio, and marketing analytics.

Subscribe to our YouTube channel for discussions on DWH, BigQuery, Looker Studio, and Google Tag Manager.

If you are interested in learning BigQuery from scratch, get access to our free BigQuery Course

Elevate your skills with Google Data Studio and BigQuery by enrolling in our Udemy course.

Need help setting up a modern, cost-efficient data warehouse or analytical dashboard? Email us at hello@datadice.io to schedule a call.