INPUT: this is the Order_Tbl table with each row equivalent to a record of a transaction:
| ORDER_DAY | ORDER_ID | PRODUCT_ID | QUANTITY | PRICE |
|---|---|---|---|---|
| 2015-05-01 | ODR1 | PROD1 | 5 | 5 |
| 2015-05-01 | ODR2 | PROD2 | 2 | 10 |
| 2015-05-01 | ODR3 | PROD3 | 10 | 25 |
| 2015-05-01 | ODR4 | PROD1 | 20 | 5 |
| 2015-05-02 | ODR5 | PROD3 | 5 | 25 |
| 2015-05-02 | ODR6 | PROD4 | 6 | 20 |
| 2015-05-02 | ODR7 | PROD1 | 2 | 5 |
| 2015-05-02 | ODR8 | PROD5 | 1 | 50 |
| 2015-05-02 | ODR9 | PROD6 | 2 | 50 |
| 2015-05-02 | ODR10 | PROD2 | 4 | 10 |
EXPECTED OUTPUT: the task is to write a T-SQL query to get products that was ordered on 02-May-2015 but not on any other days before that:
| ORDER_DAY | ORDER_ID | PRODUCT_ID | QUANTITY | PRICE |
|---|---|---|---|---|
| 2015-05-02 | ODR6 | PROD4 | 6 | 20 |
| 2015-05-02 | ODR8 | PROD5 | 1 | 50 |
| 2015-05-02 | ODR9 | PROD6 | 2 | 50 |
I did try 2 solutions with a same approach: using a (completely identical) subquery to get a list of distinct products that were ordered before 02-May-2015, and then somehow putting it after the NOT IN operator inside the WHERE clause of the main query.
Solution 1: the subquery was passed in as an CTE. It throws a syntax error ...
WITH CTE AS
(
SELECT DISTINCT PRODUCT_ID
FROM Order_Tbl
WHERE ORDER_DAY < '2015-05-02'
)
SELECT *
FROM Order_Tbl
WHERE ORDER_DAY = '2015-05-02'
AND PRODUCT_ID NOT IN CTE
Solution 2: the subquery was embedded into the WHERE clause of the main query. This worked!
SELECT *
FROM Order_Tbl
WHERE ORDER_DAY = '2015-05-02'
AND PRODUCT_ID NOT IN (SELECT DISTINCT PRODUCT_ID
FROM Order_Tbl
WHERE ORDER_DAY < '2015-05-02')
What was the nuance that made SQL behave and return different results? I would appreciate it if you guys could give me a clear explanation as well as some useful notes for further SQL implementations.
It has been explained by Thom A in the comments that
NOT INdoes not accept the first syntax you tried, and that, even with the right syntax,NOT EXISTSis in general preferable toNOT IN:The query ensures that there is no row in the table for the same product and an erlier order date. For performance, consider an index on
order_tbl(product_id, order_day).But overall, it is probably simpler and more efficient to use window functions ; the subquery can just be replaced with a window
min():