Date issue - how do I parse date in this SQL

38 Views Asked by At

A date error keeps on occurring when I run my code. I've parsed the date but my suspicion is I need to parse it elsewhere in the code. I've tried doing this but I'm obviously missing something.

When t1 is run separately I don't get any errors so I suspect the issue is in t2 or t3.

The data being queried is the public GA4 dataset so you can try this yourself.

Many thanks in advance.

WITH t1 AS (
    SELECT
        PARSE_DATE('%Y%m%d', event_date) AS event_date,
        user_pseudo_id,
        COUNT(DISTINCT user_pseudo_id) AS users,
        MAX(CASE WHEN params.key = "ga_session_id" THEN params.value.int_value ELSE 0 END) AS sessionId,
        CASE WHEN event_name = "first_visit" THEN 1 ELSE 0 END AS newUsers,
        COUNT(ecommerce.transaction_id) AS conversions,
        SUM(ecommerce.purchase_revenue) AS totalRevenue
    FROM
        `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` AS ga, 
        UNNEST (event_params) AS params
    WHERE 
        _table_suffix BETWEEN '20210101' AND '20210131'
    GROUP BY  event_date, user_pseudo_id, event_name, ecommerce.transaction_id
),
t2 AS (
    SELECT 
        user_pseudo_id,
        COUNT(ecommerce.transaction_id) AS conv,
        AVG(COUNT(ecommerce.transaction_id)) 
            OVER (
                PARTITION BY user_pseudo_id
                ORDER BY user_pseudo_id
                ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING 
            ) AS avg_conversion,
        CAST(MAX(event_date) AS DATE) AS most_recent_purchase, 
        CAST(MIN(event_date) AS DATE) AS first_purchase 
    FROM
        `bigquery-public-data.ga4_obfuscated_sample_ecommerce.events_*` AS ga, 
        UNNEST (event_params) AS params
    WHERE 
        _table_suffix BETWEEN '20210101' AND '20210131'
    GROUP BY user_pseudo_id, event_date
),
t3 AS (
    SELECT 
        user_pseudo_id,
        DATE_DIFF(
            t2.most_recent_purchase, t2.first_purchase, DAY
        ) / 30.0 AS day_in_between_purchases,
        t2.avg_conversion AS conversions
    FROM t2
    GROUP BY user_pseudo_id, most_recent_purchase, first_purchase, avg_conversion
)
SELECT 
    t1.event_date,
    t4.newUsers,
    t4.conversions,
    t4.totalRevenue
FROM (
    SELECT
        t1.user_pseudo_id,
        t1.event_date,
        SUM(newUsers) AS newUsers,
        SUM(conversions) AS conversions,
        CONCAT('$', IFNULL(SUM(totalRevenue), 0)) AS totalRevenue
    FROM 
        t1
    GROUP BY t1.user_pseudo_id, t1.event_date
) AS t4
LEFT JOIN t3 ON t3.user_pseudo_id = t4.user_pseudo_id
LEFT JOIN t2 ON t2.user_pseudo_id = t4.user_pseudo_id
WHERE t2.conv > 0;

1

There are 1 best solutions below

0
Peter On

As a quick observation … while you parse the event_date in t1, you don't do that in t2.

So … if event_date ain’t no date not, I would not take it granted, your SQL dialect (MS SQL Server?) can just cast it.

Besides that … there should be some helpful error messages (sometimes them even come with positional information).