I have a table with a large number of events that have happened, like the following:
| ts | event |
|---|---|
| 2023-05-09 19:20:19 UTC | joined |
| 2023-01-16 09:34:02 UTC | joined |
| 2022-08-19 10:02:44 UTC | left |
| 2022-10-06 10:11:12 UTC | joined |
| 2021-10-06 10:11:13 UTC | left |
Each event involves either a "joiner" or a "leaver", with a TIMESTAMP "ts" associated. I need to create a report that gives joiners, leavers and total users for each month. The total users would be a running total of the number of users, like the following:
| Year | Month | Joiners | Leavers | Total |
|---|---|---|---|---|
| 2023 | 03 | 50 | 0 | 50 |
| 2023 | 04 | 5 | 1 | 54 |
| 2023 | 05 | 30 | 10 | 74 |
| 2023 | 06 | 10 | 5 | 79 |
I need to compute:
- count for "Joiners", namely the number of "joined" events that happened in the month,
- count for "Leavers", which would be the number of "left" events that happened in the month
- "Total", as a running total number of joined users minus left users (if 10 people joined and 2 people left, then the actual total number of people would be 8; if 10 more people joined next month and 0 people left then the Total needs to be 18 in the next month)
Here's what I've tried so far:
SELECT EXTRACT(YEAR FROM ts) year,
EXTRACT(MONTH FROM ts) month,
event,
COUNT(event) AS total,
SUM(COUNT(event)) OVER(ORDER BY event) AS running_total
FROM `data.events`
GROUP BY year, month, event
ORDER BY year ASC, month ASC, event ASC
This is the closest that I've achieved so far with a very basic understanding of SQL, however it's not what I was aiming for. I can't get the COUNT or SUM running total to work.
Any help that could point me in the right direction would be appreciated.
You shall first generate your two aggregated values "Joiners" and "Leavers" with conditional aggregation, then use a running sum on
Joiners-Leavers.