How to Reconstruct Accurate Sessions in BigQuery: Session Stitching in GA4

Session counts in GA4 often don’t match what you see in BigQuery exports. This is because GA4 constructs sessions using rules like inactivity timeouts, traffic source changes, and midnight splits.

For analysts and marketers, this can lead to confusion in attribution, engagement tracking, and conversion reporting. In this post, we’ll break down session stitching, provide simple and advanced SQL examples, and show how to handle common real-world scenarios.


The Problem: Why Sessions Don’t Match

GA4 defines a session as a group of user interactions within a specific time frame. However, sessions can be split or duplicated due to:

  1. Inactivity Timeout: Sessions end after 30 minutes of user inactivity.
  2. Traffic Source Changes: Switching campaigns mid-session triggers a new session.
  3. Midnight Cutoff: Sessions crossing midnight are split into two sessions.
  4. Device Changes: Optionally, a device category change can start a new session.

Impact:

  • Session counts fluctuate unexpectedly
  • GA4 UI reports differ from BigQuery exports
  • Attribution and engagement analysis can be misleading

Solution: Reconstruct Sessions in BigQuery

BigQuery lets you stitch events into sessions based on business rules, giving you accurate session counts and metrics.


1. Simple SQL for Session Counting

If you only need basic session counts, you can use the session_start event:

SELECT
  user_pseudo_id,
  COUNT(event_name) AS sessions
FROM
  `your_project.analytics_XXXX.events_*`
WHERE
  event_name = 'session_start'
GROUP BY user_pseudo_id
ORDER BY sessions DESC;

Explanation:

  • This counts every session_start per user.
  • Simple but doesn’t account for traffic source changes, midnight splits, or inactivity longer than 30 minutes.

2. Deep Dive SQL: Session Stitching With Multiple Scenarios

For accurate, session-level analysis, use this advanced query:

-- Step 1: Identify previous events per user
WITH events_with_lag AS (
  SELECT
    user_pseudo_id,
    user_id, -- Optional for cross-device tracking
    event_timestamp,
    event_name,
    traffic_source.source AS traffic_source,
    traffic_source.medium AS traffic_medium,
    device.category AS device_category,
    LAG(event_timestamp) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_ts,
    LAG(traffic_source.source) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_source,
    LAG(device.category) OVER(PARTITION BY user_pseudo_id ORDER BY event_timestamp) AS prev_device
  FROM
    `your_project.analytics_XXXX.events_*`
),

-- Step 2: Define session boundaries
session_flags AS (
  SELECT
    *,
    CASE
      WHEN TIMESTAMP_DIFF(TIMESTAMP_MICROS(event_timestamp), TIMESTAMP_MICROS(prev_ts), MINUTE) > 30 THEN 1
      WHEN traffic_source != prev_source THEN 1
      WHEN EXTRACT(DATE FROM TIMESTAMP_MICROS(event_timestamp)) != EXTRACT(DATE FROM TIMESTAMP_MICROS(prev_ts)) THEN 1
      WHEN device_category != prev_device THEN 1
      ELSE 0
    END AS new_session_flag
  FROM events_with_lag
),

-- Step 3: Assign session numbers
sessionized_events AS (
  SELECT
    *,
    SUM(new_session_flag) OVER (PARTITION BY user_pseudo_id ORDER BY event_timestamp ROWS UNBOUNDED PRECEDING) + 1 AS session_number
  FROM session_flags
)

-- Step 4: Aggregate sessions
SELECT
  user_pseudo_id,
  user_id,
  session_number,
  MIN(TIMESTAMP_MICROS(event_timestamp)) AS session_start_time,
  MAX(TIMESTAMP_MICROS(event_timestamp)) AS session_end_time,
  COUNT(*) AS events_in_session,
  ARRAY_AGG(DISTINCT traffic_source) AS traffic_sources_in_session,
  ARRAY_AGG(DISTINCT device_category) AS devices_in_session
FROM sessionized_events
GROUP BY user_pseudo_id, user_id, session_number
ORDER BY user_pseudo_id, session_start_time;

What this query does:

  • Tracks inactivity >30 minutes
  • Detects traffic source changes
  • Handles midnight splits
  • Optionally accounts for device category changes
  • Produces a unique session ID per user-event sequence

Key Takeaways

  • GA4 splits sessions for inactivity, source changes, and midnight cutoff.
  • Simple session_start counts work for rough estimates.
  • Deep session stitching in BigQuery gives accurate session-level metrics for attribution, engagement, and conversion analysis.
  • Adjust session stitching logic based on business requirements, including device changes or cross-device tracking.

TL;DR

  • GA4 session counts often differ from BigQuery due to session-breaking rules.
  • Simple SQL can count session_start events.
  • Deep-dive SQL in BigQuery lets you stitch sessions, accounting for inactivity, campaign changes, midnight splits, and devices.
  • Accurate session metrics improve marketing, engagement, and funnel insights.

Discover more from GA4BigQuery

Subscribe to get the latest posts sent to your email.

Posted in

Leave a comment

Discover more from GA4BigQuery

Subscribe now to keep reading and get access to the full archive.

Continue reading