order of execution confusion

Question

order of execution confusion

55 Views Asked by Kartik At 12 June 2023 at 23:25

SELECT First_name,last_name,email,ROUND (AVG(Amount),2) AS moneyspent
FROM Customer INNER JOIN Payment
ON Customer.customer_id= Payment.customer_id
WHERE Staff_id=2 
GROUP BY First_name,last_name,email
HAVING SUM(Amount)>100
ORDER BY Moneyspent DESC

first_name	last_name	email	moneyspent
Brittany	Riley	[email protected]	5.83
Arnold	Havens	[email protected]	5.14
Eleanor	Hunt	[email protected]	5.04
Stacey	Montgomery	[email protected]	4.73
Karl	Seal	[email protected]	4.63

I am confused about the Order of execution. I understand we cannot filter based on an aggregated result like AVG(Amount) in WHERE because that hasn't been executed yet. Does AVG(Amount) happen at the GROUP BY or SELECT?

If it happens at GROUP BY, there should be three columns First_name, Last_name, email and AVG(Amount) left. How can HAVING SUM(AMOUNT) still be executing?
If it happens at SELECT, then how come we can still filter using AVG(Amount) at HAVING, as HAVING comes before SELECT?

Original Q&A

There are 2 best solutions below

**tinazmu** · Answer 1 · 2023-06-12T23:53:36.597000

When processing your GROUP BY the query execution engine will also evaluate SUM(Amount) (as well as AVG(Amount)) for each first_name, last_name, email combination; because these are referred to by your SELECT and HAVING, respectively. Some RDBMSs don't allow HAVING referring to an expression not already on the SELECT clause, but I guess PostgreSql is doing us a favour and allowing it. For your purposes you should assume that those aggregations are calculated at the GROUP BY time (HAVING is applied afterwards; but this is really a 'logical ' order, because the engine can apply parts of it earlier if it deduces it would be cheaper and that it wouldn't change results).

**George S** · Answer 2 · 2023-06-13T02:36:14.047000

The GROUP BY and HAVING clauses can be thought of as evaluated before the SELECT clauses for when considering what the result would be, but any aggregate functions used in SELECT column definitions still have access to the underlying disaggregated data.

The order to think about it is:

JOINs and WHERE clause
GROUP BY; note that however, you can also group by columns in the output query, e.g. GROUP BY 1, 2 but this is syntatic sugar
HAVING
SELECT

You can see that the actual calculation of the resulting columns occurs last because if you use a window function as part of the SELECT to access (for example) the previous row you can only access data that you have either included in GROUP BY or that uses an aggregate function, and data removed by the HAVING or WHERE clause is filtered out.

order of execution confusion

There are 2 best solutions below

Related Questions in SQL

Related Questions in POSTGRESQL

Related Questions in ORDER-OF-EXECUTION

Trending Questions

Popular # Hahtags

Popular Questions