Using () OVER or HAVING clause to get monthly aggregates of counts

76 Views Asked by At

I have a big dataset on ticket sales throughout a single year. The schema I am working with is:

ID
date_time_sale (Timestamp, yyyy-MM-dd hh-mm-ss)
weekday (varchar, Mon to Sun)
number_tickets (integer)   
ticket_price (float)
total_price (float)

I am trying to get to get the weekday of every month of the year where the highest number of tickets was sold, so, for example, the output would be:

year month weekday total_tickets
2015 01 SAT 5400
2015 02 SUN 4300
2015 03 SUN 6400

I tried using the following, but admittedly SQL is not my strongest skill:

SELECT DISTINCT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
      EXTRACT(MONTH FROM date_time_sale) AS MONTH,
      week_day,
      RANK () OVER (PARTITION BY YEAR, MOMTH ORDER BY count(week_day) ASC) weekday_count
      from ticket_sales
      order by YEAR, MONTH

But I keep running into errors. I tried using a HAVING clause, but I coludn't go anywhere. Any tip on how to effectively use the RANK () OVER (PARTITION BY) clause to get this output, please? Or do I need to use COUNT () OVER?

The analysis exception says:

`cannot resolve '`YEAR`' given input columns: [ticket_sales.YEAR, ticket_sales.MONTH, weekday]; line 1 pos 292;\n'Sort ['YEAR ASC NULLS FIRST, 'MONTH ASC NULLS FIRST], true\n+- Project [YEAR#342, MONTH#358 

but then it is quite a long error.

Update:

So I tried this code:

SELECT DISTINCT year,
          month,
          week_day,
          COUNT (week_day) OVER (PARTITION BY year, month, week_day) AS weekday_count 
          from ticket_sales
           order by year, month, weekday_count DESC

And what that did is give the results of all week days in the for every months, so the output is 12*7 instead of 12 rows. Still ways to learn around this but at least I am somewhere.

1

There are 1 best solutions below

10
nachospiu On

Try this query and let me know if return the desire result:

I'm not sure if field name is number_tickets or total_tickets, I used number_tickets.

First I sum numbers tickets from year, month and week day, then return a row per year and month with the week's day in which more tickets were sold.

WITH total_by_day AS (SELECT EXTRACT(YEAR FROM date_time_sale) AS YEAR,
      EXTRACT(MONTH FROM date_time_sale) AS MONTH,
      week_day,
      SUM(number_tickets) AS number_tickets
FROM ticket_sales
GROUP BY YEAR, MONTH, week_day)

SELECT DISTINCT 
      YEAR,
      MONTH,
      FIRST_VALUE(week_day) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS week_day,
      FIRST_VALUE(number_tickets) OVER (PARTITION BY YEAR, MONTH ORDER BY number_tickets DESC) AS total_tickets
FROM total_by_day
ORDER BY YEAR, MONTH;

In Postgresql database I got the desire result.