aggregation of aggregation not allowed

171 Views Asked by At

I am using google bigquery.

I want to find the average number of events for each day of the week (Sunday,monday,tuesday,wednesday,thursday,friday,saturday) for a certain time period (1 to 28 May).

For example there were 4 Sundays between May 1 to May 28, I want to find out average number of events for Sunday based on those 4 Sundays.

Here is my code:

SELECT EXTRACT(DAYOFWEEK FROM TIMESTAMP_MICROS(event_timestamp)) AS day,
       AVG(COUNT(*)) AS average_events
FROM `table`
WHERE DATE_TRUNC(EXTRACT(DATE FROM TIMESTAMP_MICROS(event_timestamp)), DAY)
      BETWEEN DATETIME('2023-05-01 00:00:00') AND DATETIME('2023-05-28 00:00:00')
GROUP BY day
ORDER BY day

this gives an error like "Aggregations of aggregations are not allowed"

edit:

I have tried the following -

SELECT day,
       AVG(counts) AS average_events
FROM (
   SELECT EXTRACT(DAYOFWEEK FROM event_timestamp) AS day,
          COUNT(*) AS counts
   FROM `table`
   WHERE DATE_TRUNC(event_timestamp, DAY)
         BETWEEN DATETIME('2023-05-01 00:00:00') AND DATETIME('2023-05-28 00:00:00')
   GROUP BY day
) AS t
GROUP BY day
ORDER BY day

but here, the count is the same as the average.

2

There are 2 best solutions below

3
GMB On BEST ANSWER

You want to aggregate by day first, and get each associated count - then by day of the week.

I would recommend:

select event_dow, avg(cnt) avg_cnt
from (
    select 
        timestamp_trunc(timestamp_micros(event_timestamp), day) event_day
        extract(dayofweek from timestamp_micros(event_timestamp)) event_dow, 
        count(*) cnt
    from mytable
    where event_timestamp >= unix_micros(timestamp '2023-05-01')
      and event_timestamp <  unix_micros(timestamp '2023-05-29')
    group by event_day, event_dow
) t
group by event_dow                                                           -- dow aggregation
order by event_dow

Note that I attempted to optimize the where clause of the query by filtering directly against original values (we can convert the bounds to unix timestamps instead) - and using half-open intervals.

0
NickW On

I don't have access to GBQ so you'll probably need to adjust the date functions to those used by BQ, but the following should show you how to write your own query:

with daily_count as (
    select trunc(event_timestamp, 'day') event_day
    , count(1) count_day
    from agg_data
    group by event_day
)
select DAYNAME(event_day), avg(count_day)
from daily_count
group by DAYNAME(event_day)
;