Why might my SUM query be producing different results each time I run it?

1k Views Asked by At

It's probably pointless posting this as I can't show the actual code causing the problem, and due to not knowing the source of the issue have not been able to recreate it using fake data, but wanted to ask in case anyone has seen something similar.

I have a query that has a SUM windows function that returns different values each time it's run, despite no changes in the code or underlying data. Literally the same statement run two seconds apart will return a seemingly random selection from 4 to 7.

I can't print the actual code but the line that produces the varying results is as follows, where the column names have been changed: -

SUM(UNITS) OVER (PARTITION BY ACCOUNT_KEY, TRANSACTION_DATE 
    ORDER BY ACCOUNT_KEY,TRANSACTION_DATE)  NUMBER_OF_UNITS      

If anyone can please offer any suggestions on how to approach this or if you've had a similar issue, please let me know. I realise that without the full code it's difficult-to-impossible to know for sure, but there's a chance that someone has seen a similar problem and might be offer some guidance.

1

There are 1 best solutions below

3
Gordon Linoff On

Your ORDER BY keys are the same as your PARTITION BY keys. So, there is no ordering -- everything in a window frame has the same sort key values.

You are doing a cumulative sum on an unstable sort. So the rows can be -- and are -- in any arbitrary order.

Presumably, you want a cumulative sum of UNITS for each account, order by the date:

SUM(UNITS) OVER (PARTITION BY ACCOUNT_KEY ORDER BY TRANSACTION_DATE) as NUMBER_OF_UNITS

Assuming that TRANSACTION_DATE is unique, this will be stable. Note that by default, equal transaction dates for an account as all appearing together because the default windowing clause is RANGE:

SUM(UNITS) OVER (PARTITION BY ACCOUNT_KEY
                 ORDER BY TRANSACTION_DATE
                 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
                ) as NUMBER_OF_UNITS