I am currently trying to recreate an advanced segment from UA into bigquery and running into issues getting the numbers to match.
I need to exclude all sessions from users who have ever completed an event with a category of 'booking' and all sessions from outside the US. My current query gets me close to the number in our looker dashboard but its less than it should be.
This is my current code:
WITH Excluded_Users AS ( SELECT visitId AS exclusion_id FROM my_data.ga_sessions_*, UNNEST(hits) AS hits WHERE hits.eventInfo.eventCategory = 'Booking' )
SELECT CONCAT( FORMAT("%04d", EXTRACT(YEAR FROM PARSE_DATE("%Y%m%d", date))), '-', FORMAT("%02d", EXTRACT(MONTH FROM PARSE_DATE("%Y%m%d", date))) ) AS year_month,COUNT(DISTINCT IF(totals.visits=1, CONCAT(fullVisitorId, CAST(visitId AS STRING)), NULL) ) as sessions,
FROM my_data.ga_sessions_* s
LEFT JOIN Excluded_Users e ON s.visitId = e.exclusion_id
WHERE e.exclusion_id IS NULL AND s.totals.visits = 1 AND geoNetwork.Country = 'United States'
GROUP BY 1
A couple of things:
visitIdis just a timestamp of the session (ignoring the midnight split unlikevisitStartTime)fullvisitorid+visitstarttimeto calculate sessionshitsif necessary, save some resources by using subqueries