Why GA4 User Counts Don’t Match Between BigQuery and the GA4 Interface

If you’ve ever compared GA4 user metrics in the interface with BigQuery exports, you might have noticed discrepancies. For example, Active Users, New Users, or Total Users in the GA4 UI often don’t match what you calculate from the raw BigQuery data.

Understanding why this happens is crucial for accurate reporting, marketing attribution, and data-driven decision-making. This guide explains the problem and offers detailed solutions to align your BigQuery metrics with GA4 reporting logic.


The Problem: Discrepancies Between GA4 UI and BigQuery

MetricGA4 InterfaceBigQuery ExportWhy It Differs
Active UsersCounts users with engaged sessionsCounted via user_pseudo_id or user_id in eventsGA4 filters by engagement events, session timeout, and deduplicates using blended identity
New UsersFirst-time user in a propertyCalculated from earliest event per userGA4 uses first_visit/first_open with UI-level session stitching; BigQuery needs custom SQL
Returning UsersUsers with previous sessionsUsers with prior events in BigQueryGA4 UI automatically excludes first sessions; raw data requires joins and filters
Total UsersDeduplicated by GA4 identity rulesCOUNT(DISTINCT user_pseudo_id/user_id)BigQuery counts may overcount if identity stitching not applied

Key Insight: GA4 UI metrics apply deduplication, identity prioritization, filters, and session logic automatically, while BigQuery exports are raw event-level data.


The Solution: Aligning BigQuery Metrics with GA4

To ensure your BigQuery user counts match GA4, follow these detailed steps:

1. Deduplicate Users Correctly
  • Use COALESCE(user_id, user_pseudo_id) to combine logged-in and anonymous users.
  • Always use COUNT(DISTINCT …) when counting users to avoid double-counting across devices.
  • If your property has a login system, prioritize user_id for cross-device analysis.

Example:

COUNT(DISTINCT COALESCE(user_id, user_pseudo_id)) AS active_users

This ensures every unique person is counted only once, even if they visit multiple times or from multiple devices.


2. Filter for Engaged Sessions
  • GA4 counts only users who triggered engagement events as Active Users.
  • Use event_name = 'user_engagement' in your queries.
  • You can further filter by engagement time thresholds if your GA4 property has custom definitions.

Example:

WHERE event_name = 'user_engagement'

This mimics GA4 UI logic for Active Users, avoiding inflated counts from short or non-interactive visits.


3. Match Date Ranges and Timezones
  • GA4 UI applies the property timezone, while BigQuery exports are in UTC.
  • Use SQL functions like DATE(TIMESTAMP_SECONDS(event_timestamp), "America/New_York") to align dates.
  • Ensuring consistent date ranges prevents off-by-one discrepancies in daily, weekly, or monthly reporting.

4. Use GA4 Identity Spaces
  • GA4 tracks users across three identity spaces: Blended, Observed, Device-only.
  • To replicate Blended in BigQuery, combine user_id and user_pseudo_id using COALESCE.
  • For device-only analysis, use user_pseudo_id exclusively.
  • This step is crucial for aligning Total, Active, and Returning Users with GA4 UI metrics.

5. Implement Session Stitching
  • GA4 automatically stitches sessions for users with user_id across devices.
  • In BigQuery, create session-level tables using session_id or ga_session_id.
  • Deduplicate users per session and combine sessions for cross-device reporting.

Example:

SELECT
  COUNT(DISTINCT COALESCE(user_id, user_pseudo_id)) AS active_users
FROM
  `project.dataset.events_*`
WHERE
  event_name = 'user_engagement'
GROUP BY session_id

This ensures Active Users match GA4’s cross-device session logic.


6. Optional: Adjust for Sampling & Data Thresholds

  • GA4 UI applies data thresholds for privacy when reporting on small user counts.
  • BigQuery exports raw data without thresholds, so counts may appear higher.
  • If needed, apply your own threshold rules in SQL to align with GA4 privacy constraints.

Quick SQL Example: Counting Active Users

SELECT
  event_date,
  COUNT(DISTINCT COALESCE(user_id, user_pseudo_id)) AS active_users
FROM
  `project.dataset.events_*`
WHERE
  event_name = 'user_engagement'
  AND event_date BETWEEN '2025-08-01' AND '2025-08-07'
GROUP BY
  event_date
ORDER BY
  event_date;

What’s happening in this SQL:

  1. Deduplicates logged-in and anonymous users with COALESCE.
  2. Filters only user_engagement events to mimic GA4 Active Users.
  3. Counts unique users per day for daily Active Users.

Key Takeaways

  • GA4 UI and BigQuery counts differ due to deduplication, session logic, identity prioritization, and timezone differences.
  • To match GA4 metrics in BigQuery:
    1. Deduplicate users with COALESCE(user_id, user_pseudo_id)
    2. Filter for engagement events (user_engagement)
    3. Align date ranges and timezones
    4. Respect GA4 identity spaces (Blended, Observed, Device)
    5. Implement session stitching when necessary
    6. Optionally apply thresholds for privacy alignment
  • Following these steps ensures accurate reporting, cross-device analysis, and reliable marketing insights.

For a deeper dive into GA4 user metrics and BigQuery SQL, see: Mastering GA4 Users: Metrics, Identity, and BigQuery Analysis.


Discover more from GA4BigQuery

Subscribe to get the latest posts sent to your email.

Posted in

Leave a comment

Discover more from GA4BigQuery

Subscribe now to keep reading and get access to the full archive.

Continue reading