How to rank over result of window function in postgres efficiently?

Question

How to rank over result of window function in postgres efficiently?

50 Views Asked by dimitris lepipas At 29 September 2023 at 12:21

I have a db table like shown below:

                              Table "public.counters_by_app_category_app"
                 Column                 |            Type             | Collation | Nullable | Default
----------------------------------------+-----------------------------+-----------+----------+---------
 timestamp                              | timestamp without time zone |           | not null |
 granularity                            | interval                    |           | not null |
 app_category                           | text                        |           | not null |
 app                                    | text                        |           | not null |
 data_volume_up                         | numeric                     |           | not null | 0
 data_volume_down                       | numeric                     |           | not null | 0

and some sample rows

      timestamp      | granularity | app_category |  app  | data_volume_up | data_volume_down
---------------------+-------------+--------------+-------+----------------+------------------
 2023-09-24 04:30:00 | 00:30:00    | 17           | 25301 |         458890 |           479658
 2023-09-24 04:30:00 | 00:30:00    | 54           | 29877 |          38131 |           185370
 2023-09-24 04:30:00 | 00:30:00    | 81           | 22163 |          27319 |            32569

Suppose I want to find top 10 (app_category, app) pairs by volume = data_volume_up + data _volume_down, for a time-range of a day and then get all data from the table for these top 10 pairs and the same time-range.

The easy way is with 2 queries: 1st find top 10 with a group by query and an aggregate (SUM) function

mydatabase=> select granularity, app_category, app, sum(data_volume_up+data_volume_down) as vol from counters_by_app_category_app
WHERE timestamp >= '2023-09-28 00:00:00' AND timestamp < '2023-09-29 00:00:00' AND granularity = '00:30:00'
group by granularity, app_category, app
order by vol desc limit 10;
 granularity | app_category |  app  |     vol
-------------+--------------+-------+--------------
 00:30:00    | 26           | 3470  | 264871859641
 00:30:00    | 17           | 8229  | 159708538857
 00:30:00    | 15           | 6583  |  90496671008
 00:30:00    | 24           | 2059  |  89593472005
 00:30:00    | 26           | 3212  |  70312526765
 00:30:00    | 66           | 8578  |  63943121581
 00:30:00    | 3            | 2783  |  58159595285
 00:30:00    | 2            | 4568  |  54144586115
 00:30:00    | 17           | 25296 |  35847681517
 00:30:00    | 4            | 12    |  27602121737
(10 rows)

Then get rows from the table for the same time-range and granularity, using in where clause the pairs found in "find" step.

I tried to use a window function to achieve the same but with one query. Here is my approach:

Get total volume per (app_category, app) for the whole day with:

mydatabase=> SELECT
timestamp,
granularity,
app_category,
app,
data_volume_up,
data_volume_down,
SUM(data_volume_up+data_volume_down) OVER(PARTITION BY app_category, app) as total_vol_order
FROM counters_by_app_category_app
WHERE timestamp >= '2023-09-28 00:00:00' AND timestamp < '2023-09-29 00:00:00' AND granularity = '00:30:00'
order by timestamp, total_vol_order desc;
      timestamp      | granularity | app_category |  app  | data_volume_up | data_volume_down | total_vol_order
---------------------+-------------+--------------+-------+----------------+------------------+-----------------
 2023-09-28 00:00:00 | 00:30:00    | 26           | 3470  |      195256369 |       3862177889 |    264871859641
 2023-09-28 00:00:00 | 00:30:00    | 17           | 8229  |       58679885 |       4271976092 |    159708538857
 2023-09-28 00:00:00 | 00:30:00    | 15           | 6583  |      105577241 |       4569555704 |     90496671008
 2023-09-28 00:00:00 | 00:30:00    | 24           | 2059  |           7166 |            74220 |     89593472005
...

But want to keep only the top 10 for each timestamp, so I want to rank rows based on the result of the window function and keep top 10 rows. Here is how I did it:

mydatabase=> with data as (
SELECT
timestamp,
granularity,
app_category,
app,
data_volume_up+data_volume_down as total_vol,
total_vol_order,
DENSE_RANK() OVER(order by total_vol_order DESC) as drank
FROM (
SELECT
timestamp,
granularity,
app_category,
app,
data_volume_up,
data_volume_down,
SUM(data_volume_up+data_volume_down) OVER(PARTITION BY app_category, app) as total_vol_order
FROM counters_by_app_category_app
WHERE timestamp >= '2023-09-28 00:00:00' AND timestamp < '2023-09-29 00:00:00' AND granularity = '00:30:00'
) as ss
)
select * from data
where drank <=10
order by timestamp, drank;
      timestamp      | granularity | app_category |  app  | total_vol  | total_vol_order | drank
---------------------+-------------+--------------+-------+------------+-----------------+-------
 2023-09-28 00:00:00 | 00:30:00    | 26           | 3470  | 4057434258 |    264871859641 |     1
 2023-09-28 00:00:00 | 00:30:00    | 17           | 8229  | 4330655977 |    159708538857 |     2
 2023-09-28 00:00:00 | 00:30:00    | 15           | 6583  | 4675132945 |     90496671008 |     3
 2023-09-28 00:00:00 | 00:30:00    | 24           | 2059  |      81386 |     89593472005 |     4
 2023-09-28 00:00:00 | 00:30:00    | 26           | 3212  | 1640757311 |     70312526765 |     5
 2023-09-28 00:00:00 | 00:30:00    | 66           | 8578  | 1047484505 |     63943121581 |     6
 2023-09-28 00:00:00 | 00:30:00    | 3            | 2783  |  167416738 |     58159595285 |     7
 2023-09-28 00:00:00 | 00:30:00    | 2            | 4568  |  290666849 |     54144586115 |     8
 2023-09-28 00:00:00 | 00:30:00    | 17           | 25296 | 1149896605 |     35847681517 |     9
 2023-09-28 00:00:00 | 00:30:00    | 4            | 12    |  582563699 |     27602121737 |    10
 2023-09-28 00:30:00 | 00:30:00    | 26           | 3470  | 5608569787 |    264871859641 |     1
 2023-09-28 00:30:00 | 00:30:00    | 17           | 8229  | 4125175834 |    159708538857 |     2

Is there any other more optimal way to do it, eliminating the subqueries and with statements?

Original Q&A

How to rank over result of window function in postgres efficiently?

There are 0 best solutions below

Related Questions in WINDOW-FUNCTIONS

Related Questions in POSTGRESQL-12

Trending Questions

Popular # Hahtags

Popular Questions