How to use "Partition By" in Oracle SQL

62 Views Asked by At

I am new to use "PARTITION BY" in oracle. In the table structure that stored multiple addresses for a person as below, how to use "Partition By" in oracle SQL to

  1. get the count of record per type per person and
  2. get the latest Addr_ID per person per type
PERSON_ID ADDR_ID TYPE EFF_START_DATE EFF_END_DATE
6207445 7390814 HOME 2024-02-26 4712-12-31
6207445 0042548 HOME 2023-05-15 4712-12-31
6207445 6462255 HOME 2022-01-31 4712-12-31
6207445 6462287 MAIL 2022-01-31 4712-12-31
6207445 0042571 MAIL 2023-05-15 4712-12-31
1111111 0042578 MAIL 2023-05-15 4712-12-31

Expected Result:

Person_ID Type Rec_Count
6207445 HOME 3
6207445 MAIL 2
1111111 MAIL 1

However, I run the script as below and it gave me different result

SELECT person_id, type, 
      COUNT(*) OVER (PARTITION BY person_id) AS rec_count
FROM PER_PERSON_ADDR_USAGES_F 
WHERE
    person_id = '6207445'
ORDER BY rec_count desc
PERSON_ID TYPE REC_COUNT
6207445 HOME 5
6207445 MAIL 5
6207445 HOME 5
6207445 MAIL 5
6207445 HOME 5

What's wrong in my SQL?

Besides, What can I do if I want to get the latest Addr_ID per person per type?

SELECT person_id, type, 
    COUNT(*) OVER (PARTITION BY person_id) AS rec_count
FROM PER_PERSON_ADDR_USAGES_F 
WHERE
    to_number(rn) > 1
ORDER BY rec_count desc

I try this but failed as below:

Query Execution started at Mon Feb 26 15:58:14 AEDT 2024
Error while executing the query:
ORA-06550: line 15, column 13:
PL/SQL: ORA-00904: "REC_COUNT": invalid identifier
ORA-06550: line 7, column 17:
PL/SQL: SQL Statement ignored

Any idea what's wrong in my SQL?

3

There are 3 best solutions below

0
MT0 On BEST ANSWER

Analytic functions work over the entire result set and return exactly the same number of rows of output as there are of input; if you specify PARTITION BY then the function will split the result set up into different partitions and calculate the result within that partition but there will still be a one-to-one correspondence between the input and output rows.

If you want to aggregate so that you only produce one row per partition then do not use analytic functions and use GROUP BY:

SELECT person_id,
       type, 
       COUNT(*) AS rec_count
FROM   PER_PERSON_ADDR_USAGES_F 
GROUP BY person_id, type
ORDER BY rec_count DESC

Which, for the sample data:

CREATE TABLE PER_PERSON_ADDR_USAGES_F (PERSON_ID, ADDR_ID, TYPE, EFF_START_DATE, EFF_END_DATE) AS
SELECT 6207445, '7390814', 'HOME', DATE '2024-02-26', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 6207445, '0042548', 'HOME', DATE '2023-05-15', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 6207445, '6462255', 'HOME', DATE '2022-01-31', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 6207445, '6462287', 'MAIL', DATE '2022-01-31', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 6207445, '0042571', 'MAIL', DATE '2023-05-15', DATE '4712-12-31' FROM DUAL UNION ALL
SELECT 1111111, '0042578', 'MAIL', DATE '2023-05-15', DATE '4712-12-31' FROM DUAL;

Outputs:

PERSON_ID TYPE REC_COUNT
6207445 HOME 3
6207445 MAIL 2
1111111 MAIL 1

If you want to find the latest address then use KEEP (DENSE_RANK LAST ...):

SELECT person_id,
       type, 
       MAX(addr_id) KEEP (DENSE_RANK LAST ORDER BY eff_start_date) AS addr_id,
       COUNT(*) AS rec_count
FROM   PER_PERSON_ADDR_USAGES_F 
GROUP BY person_id, type
ORDER BY rec_count DESC

Which, for the sample data, outputs:

PERSON_ID TYPE ADDR_ID REC_COUNT
6207445 HOME 7390814 3
6207445 MAIL 0042571 2
1111111 MAIL 0042578 1

Alternatively, use analytic functions and then filter to only get the first row of each partition:

SELECT person_id,
       type, 
       addr_id,
       rec_count
FROM   (
  SELECT person_id,
         type, 
         addr_id,
         ROW_NUMBER() OVER (PARTITION BY person_id, type ORDER BY eff_start_date DESC)
           AS rn,
         COUNT(*) OVER (PARTITION BY person_id, type) AS rec_count
  FROM   PER_PERSON_ADDR_USAGES_F 
)
WHERE  rn = 1
ORDER BY rec_count DESC

Which outputs the same as the previous query.

fiddle

0
learning On

Partition by similar to group by, just mention which columns you want to group by

If you want to group by the type also, you need to include the type

COUNT(*) OVER (PARTITION BY person_id) AS rec_count

This means you want to group by person_id, but type is not included.

So, please include the type as below

COUNT(*) OVER (PARTITION BY person_id, type) AS rec_count
0
d r On

Here is one option to get result answering both questions:

WITH    --  S a m p l e    D a t a :
    PER_PERSON_ADDR_USAGES_F (PERSON_ID, ADDR_ID, TYPE, EFF_START_DATE, EFF_END_DATE) AS
        ( Select    6207445, '7390814', 'HOME', DATE '2024-02-26', DATE '4712-12-31' From Dual Union All
          Select    6207445, '0042548', 'HOME', DATE '2023-05-15', DATE '4712-12-31' From Dual Union All 
          Select    6207445, '6462255', 'HOME', DATE '2022-01-31', DATE '4712-12-31' From Dual Union All
          Select    6207445, '6462287', 'MAIL', DATE '2022-01-31', DATE '4712-12-31' From Dual Union All
          Select    6207445, '0042571', 'MAIL', DATE '2023-05-15', DATE '4712-12-31' From Dual Union All
          Select    1111111, '0042578', 'MAIL', DATE '2023-05-15', DATE '4712-12-31' From Dual 
        )
--      M a i n    S Q L :
SELECT    PERSON_ID, TYPE, PERSON_TYPE_CNT "REC_COUNT", PERSON_TYPE_LAST_ADDR "LAST ADDR_ID"
FROM    ( SELECT    PERSON_ID,  
                    TYPE,
                    COUNT(*) OVER (PARTITION BY PERSON_ID, TYPE) AS PERSON_TYPE_CNT,
                    ADDR_ID,
                    Case When Max(EFF_START_DATE) OVER(PARTITION BY PERSON_ID, TYPE) = EFF_START_DATE 
                         Then ADDR_ID
                    End AS PERSON_TYPE_LAST_ADDR, 
                    EFF_START_DATE, 
                    EFF_END_DATE
          FROM      PER_PERSON_ADDR_USAGES_F 
        )
Where    PERSON_TYPE_LAST_ADDR Is Not Null
Order By PERSON_TYPE_CNT
/*    R e s u l t :
 PERSON_ID TYPE  REC_COUNT LAST ADDR_ID
---------- ---- ---------- ------------
   1111111 MAIL          1 0042578
   6207445 MAIL          2 0042571
   6207445 HOME          3 7390814        */

NOTE: Partition By works as Group By but the result is present in every row. Ordinary aggregation with Group By results in as many rows as there are distinct groups.
Case expression is used to determine row with last ADDR_ID so it could be used as a filter in outer query.