I am running a query that returns the count distinct of active users per different products grouped by day and country. I also added two columns 'previous_day_active_users' and 'returning_users' which return the number of users active in the previous day and the number of users active both in the previous and the current day. I want to do this in order to calculate the retention rate (returning_users / previous_day_active_users). The problem is the numbers don't match. In addition, 'previous_day_active_users' and 'returning_users' always match, which doesn't make sense. I don't understand what I am doing wrong.
Query:
WITH active_users AS (
SELECT
date_trunc('day'::text, customer_kpis_hourly.time_eet) as time_eet,
customer_kpis_hourly.market_code,
customer_kpis_hourly.user_id,
customer_kpis_hourly.active,
LAG(customer_kpis_hourly.active) OVER (PARTITION BY user_id ORDER BY time_eet) AS prev_active,
customer_kpis_hourly.active_casino,
customer_kpis_hourly.active_virtual_sports,
customer_kpis_hourly.active_poker,
customer_kpis_hourly.active_sportsbook
FROM delivery.customer_kpis_hourly
WHERE customer_kpis_hourly.time_eet >= '2021-01-01'
AND customer_kpis_hourly.time_eet < '2021-01-05'
AND customer_kpis_hourly.active
),
previous_day_active_users AS (
-- Calculate previous day's active users
SELECT
user_id,
time_eet - INTERVAL '1 day' AS previous_day
FROM active_users
WHERE prev_active
)
SELECT
'D'::text AS time_gran,
date_trunc('day'::text, active_users.time_eet) AS time_eet,
active_users.market_code,
count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active) AS active_user,
count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_casino) AS active_casino,
count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_virtual_sports) AS active_vs,
count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_poker) AS active_poker,
count(DISTINCT active_users.user_id) FILTER (WHERE active_users.active_sportsbook) AS active_sb,
count(DISTINCT previous_day_active_users.user_id) as previous_day_active_user,
count(DISTINCT CASE WHEN previous_day_active_users.user_id IS NOT NULL THEN active_users.user_id END) AS returning_users
FROM
active_users
LEFT JOIN
previous_day_active_users
ON active_users.user_id = previous_day_active_users.user_id
-- AND date_trunc('day'::text, active_users.time_eet) = previous_day_active_users.previous_day
GROUP BY
'D'::text, date_trunc('day'::text, active_users.time_eet), active_users.market_code
When you compute
previous_day_active_users, you basically computetime_eet - INTERVAL '1 day' AS previous_dayfor theuser_idgiven whereevaluates as true. But this does not seem to actually check whether the user was really working on the previous day. instead, you will need to apply something like this:
That would create you pairs or
currandprevand you can easily convert this into an aggregate query that counts the total number ofcurr.user_idas the number of active users and the total of not nullprev.user_idas the users active on the previous day. Then you can compute their difference or ratio.