How to Calculate GA4 Sessions in BigQuery: A Simple Step-by-Step Guide

Google Analytics 4 (GA4) stores event-based data in BigQuery, where sessions are derived from event timestamps and session identifiers. Below are the steps to calculate the number of sessions from GA4 BigQuery exports, including an example SQL query.


Step 1: Understanding Sessions in GA4

  • GA4 defines a session based on the session_id and user_pseudo_id.
  • The session starts when a user interacts with the website/app and continues until inactivity exceeds 30 minutes.
  • Session counts are not explicitly stored but can be derived using SQL.

Step 2: Writing the SQL Query

Use the following SQL query to count sessions from your GA4 export:

WITH session_data AS (
    SELECT
        event_bundle_sequence_id,
        user_pseudo_id,
        session_id,
        TIMESTAMP_MICROS(event_bundle_sequence_id) AS event_time
    FROM `your_project.analytics_XXXXXXXX.events_YYYYMMDD`
    WHERE event_name = 'session_start'
)
SELECT
    COUNT(DISTINCT CONCAT(user_pseudo_id, session_id)) AS total_sessions
FROM session_data;

Explanation of the Query:

  • Extracts session_id and user_pseudo_id from GA4 data.
  • Uses event_name = 'session_start' to identify session initiations.
  • Counts distinct user-session combinations to get the total number of sessions.

Step 3: Verifying the Data in BigQuery

  1. Navigate to BigQuery Console.
  2. Open your GA4 dataset (e.g., analytics_XXXXXXXX).
  3. Paste the above SQL query and run it.
  4. Check the results for total session count.


Things to Keep in Mind:

  • GA4 Export Schema: Refer to the official GA4 Export Schema to understand the event structure.
  • Data Partitioning: If using large datasets, partition tables by event date for better performance.
  • Session Timeout Considerations: Default session timeout is 30 minutes but can be configured in GA4 settings.
  • Event Sampling: Ensure that GA4 event data is not sampled when analyzing.
  • Looker Studio Integration: You can use this query in Looker Studio for real-time visualization.
  • Cost Considerations: Query execution in BigQuery is chargeable, so optimize queries to reduce costs.

Notes:

  • If using the BigQuery sandbox, be aware of its limitations, such as lack of long-term storage.
  • Consider creating a scheduled query to refresh session counts automatically.
  • Regularly monitor data quality and check for missing or inconsistent session identifiers.
  • If working with app data, ensure correct handling of Firebase-based session data.

By following these steps and considerations, you can accurately calculate and analyze GA4 sessions in BigQuery.


Discover more from GA4BigQuery

Subscribe to get the latest posts sent to your email.

Posted in

Leave a comment

Discover more from GA4BigQuery

Subscribe now to keep reading and get access to the full archive.

Continue reading