Filtering after having used a SQL windowing function

71 Views Asked by At

Why can't I do as below, and what is the best alternative?

SELECT    year,
          category,
          name,
          COUNT() OVER (PARTITION BY name) AS count
FROM      nobel_prizes
WHERE     count = 2
ORDER BY  count DESC

I'm getting:

(sqlite3.OperationalError) misuse of aliased window function count

If I filter on something else than the column values for the aggregate function, it works. Can someone provide an example that illustrates why they might have prohibited doing the latter? Would it allow for bad stuff to happen?

This is happening in jupyter notebook using ipython-sql.

Note: I have seen questions on using window functions in WHERE-clauses, but not the other way around.

I'm new to SQL, be nice!

3

There are 3 best solutions below

0
SelVazi On

Standard SQL disallows references to column aliases in a WHERE clause. This restriction is imposed because when the WHERE clause is evaluated, the column value may not yet have been determined.

Instead, you can do it as follows:

SELECT *
FROM (
  SELECT    year,
          category,
          name,
          count() OVER (PARTITION by name) AS count
  FROM      nobel_prizes
)
WHERE     count = 2
ORDER BY  count DESC
0
jnisen On

In SQL, you have to understand which command execute first. WHERE clause is executed first and later SELECT. When it tries to find ‘count’, it does not find it. So, the solution is to use a subquery or CTE, where you solve the window function COUNT and then you get filter by the new column ‘count’

0
d r On

Beside the alias used in where clause (explained in SelVazi's answer), there are some issues in your question and code that are not quite clear

  • you are selecting double winners but ordering by count desc which makes no sense if all of them have a count of 2
  • if there are 2 double winners you'll get 4 rows (if 3 then 6 rows)
    Let's get some sample data first:
CREATE TABLE nobel_prizes ( A_YEAR INTEGER, CATEGORY TEXT, A_NAME TEXT, SOME_COLUMN TEXT );

INSERT INTO nobel_prizes(A_YEAR, CATEGORY, A_NAME, SOME_COLUMN) VALUES (2020, 'A', 'Name 1', 'some value 1');
INSERT INTO nobel_prizes(A_YEAR, CATEGORY, A_NAME, SOME_COLUMN) VALUES (2020, 'B', 'Name 2', 'some value 2');
INSERT INTO nobel_prizes(A_YEAR, CATEGORY, A_NAME, SOME_COLUMN) VALUES (2021, 'D', 'Name 1', 'some value 3'); 
INSERT INTO nobel_prizes(A_YEAR, CATEGORY, A_NAME, SOME_COLUMN) VALUES (2021, 'C', 'Name 3', 'some value 4'); 
INSERT INTO nobel_prizes(A_YEAR, CATEGORY, A_NAME, SOME_COLUMN) VALUES (2023, 'B', 'Name 2', 'some value 5'); 
INSERT INTO nobel_prizes(A_YEAR, CATEGORY, A_NAME, SOME_COLUMN) VALUES (2024, 'F', 'Name 1', 'some value 6'); 

one option is to use a cte for fetching the double winners and join that cte with your table:

WITH    -- C T E    p r i z e s :
  prizes AS
    ( Select  A_NAME, Count(*) as CNT
      From  nobel_prizes
      Group By A_NAME Having Count(*) = 2  -- <-- this excludes name 1 as triple winner
    )
--  M a i n    S Q L :
Select      np.A_YEAR, np.CATEGORY, np.A_NAME, p.CNT
From        nobel_prizes np
Inner Join  prizes p on p.A_NAME = np.A_NAME
Order By    np.A_NAME, np.A_YEAR;
/*    R e s u l t :   (2 rows per double winner)
A_YEAR  CATEGORY  A_NAME       CNT
------  --------  -----------  ---
2020    B         Name 2        2
2023    B         Name 2        2   */
  • maybe you would like to get 1 row per name
  • include all multiple winners
  • make sense of order by count desc
WITH    -- add years and categories using GROUP_CONCAT()
  prizes AS
    ( Select  A_NAME, Count(*) as CNT, GROUP_CONCAT(A_YEAR, ' | ') as YEARS, GROUP_CONCAT(CATEGORY, ' | ') as CATEGORIES
      From  nobel_prizes
      Group By A_NAME Having Count(*) > 1 )
--    M a i n    S Q L :
Select      np.A_NAME, p.CNT, p.YEARS, p.CATEGORIES
From        nobel_prizes np
Inner Join  prizes p on p.A_NAME = np.A_NAME
Group By    np.A_NAME, p.CNT, p.YEARS, p.CATEGORIES
Order By    p.CNT Desc;
/*    R e s u l t :
A_NAME    CNT    YEARS                 CATEGORIES
--------  ---  --------------------  -------------
Name 1      3  2020 | 2021 | 2024    A | D | F
Name 2      2  2020 | 2023           B | B       */

... see it here https://sqlfiddle.com/sqlite/online-compiler?id=5902a365-63f0-4b25-960e-2fa5187892b7

Note: International Committee of the Red Cross (ICRC) is the only 3-time recipient of the Nobel Prize, being conferred with Peace Prize in 1917, 1944, 1963.

/*  R e a l   D a t a   R e s u l t :
A_NAME                CNT   YEARS                   CATEGORIES
-----------------   -----   --------------------    ------------------------
ICRC                    3   1917 | 1944 | 1963      Peace | Peace | Peace
Marie Curie             2   1903 | 1911             Physics | Chemistry
Linus Pauling           2   1954 | 1962             Chemistry | Peace
John Bardeen            2   1956 | 1972             Physics | Physics
Frederick Sanger        2   1958 | 1980             Chemistry | Chemistry     */