The following query can be used to calculate cumulative sums:
select
(select sum(c) over (ROWS UNBOUNDED PRECEDING) as a),
sum(c) over (ROWS UNBOUNDED PRECEDING ) as b
from (select unnest(ARRAY[1, 2, 3, 3, 4, 5]) as c) x
Column b gives the result as expected, while a does not.
| a | b |
|---|---|
| 1 | 1 |
| 2 | 3 |
| 3 | 6 |
| 3 | 9 |
| 4 | 13 |
| 5 | 18 |
I've searched on Google and Stackoverflow but failed to find a similar problem.
Here are my questions:
- Why does the simple
selectwrapper lead to a completely different result? - Is it possible to keep the subquery in the
SELECTlist and still make the window function work as expect?
Your output column
ais the result of a correlated subquery, which is run once for every row produced by the underlying derived tablex. Each of these computations sees only the single row it is computed for. So the sum is always the same as the single value.Your output column
b, on the other hand, is based on the complete derived tablex. So you see actual sums of multiple rows.No. Unless you join back to additional rows from the underlying relation
xwithin the correlated subquery. You would have to generatexin a CTE to reuse it (or use an actual table), and that would be an odd, expensive construct, and most likely nonsense. (What would be the use case?)While you don't add an explicit
ORDER BYin the window frame,bis not completely reliable, either. Well, it won't break for the simple query. But it might if you do more (like add joins etc.). See:And
unnest()can stand as "table-function" on its own. No additional wrapper needed. Consider:Note that this sorts by original position in the array. Your example is ambiguous in this respect as the array has strictly ascending values which happen to sort the same when ordered by the value. I assume you care about the position.
Aside: The default framing option for window functions is
RANGE UNBOUNDED PRECEDING, so it matters (logically) to change that toROWS UNBOUNDED PRECEDINGwhile the sort order has peers - values sorting equally. And with noORDER BYin the window frame, all rows are peers.It does not matter (logically) in my query because that sorts by the "ordinality" - with deterministic sort order, no peers.
But it still matters for performance.
ROWSis faster as it does not have to check for peers. Postgres 16 includes substantial performance improvements in this area. Butsum()is unaffected sinceRANGEandROWScan make an actual difference. It still pays to switch toROWSwhen you know that the sort order is unambiguous, like in my query. See: