How to get the latest change from a table per ID

45 Views Asked by At

I have a table like

ID TYPE START END TIMESTAMP_
1 A NULL 5 '2022-03-03'
1 B NULL 7 '2023-07-19'
1 A 5 NULL '2023-07-20'
1 B NULL 7 '2023-07-21'
4 A NULL 20 '2023-06-30'
4 A 20 17 '2023-07-01'

I am trying to get the latest change per ID per TYPE (the change is shown as START and END like for the first row, the ID went from NULL to 5 for TYPE A and NULL to 7 for TYPE B, THEN 5 back to NULL for TYPE A etc). I have done

SELECT *
FROM MY_TABLE
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE, ORDER BY TIMESTAMP_ DESC) = 1;

This results in a table like

ID TYPE START END TIMESTAMP_
1 A 5 NULL '2023-07-20'
1 B NULL 7 '2023-07-21'
4 A 20 17 '2023-07-01'

Now I am trying to get the changes to sort of coalesce vertically within id group. I am trying to end up with a table like

ID TYPE START END TIMESTAMP_
1 A_B 5 7 '2023-07-21'
4 A 20 17 '2023-07-01'

where if there are two types within the ID grouping then the value for TYPE can be renamed to include both TYPES that were used. Is there a way to achieve this? Thanks!

2

There are 2 best solutions below

2
Simeon Pilgrim On BEST ANSWER

Another method is to use MAX_BY and MIN_BY and avoid carrying to much state, if you have large counts of type per id:

with fake_table(ID, TYPE, START_, END_, TIMESTAMP_) as (
    select * from values 
        (1, 'A', NULL,  5,  '2022-03-03'),
        (1, 'A', 5, NULL,   '2023-07-20'),
        (1, 'B', NULL,  7,  '2023-07-19'),
        (1, 'B', NULL,  7,  '2023-07-21'),
        (4, 'A', NULL,  20, '2023-06-30'),
        (4, 'A', 20,    17, '2023-07-01')
), latest_rows as (
    SELECT *
    FROM fake_table
    QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE ORDER BY TIMESTAMP_ DESC) = 1
)
select 
    id
    ,LISTAGG(type, '_') within group (order by  TIMESTAMP_) as type
    ,MIN_BY(START_, TIMESTAMP_) as start_
    ,MAX_BY(END_, TIMESTAMP_) as end_
from latest_rows
group by id
order by id;

gives:

enter image description here

0
ADITYA PAWAR On

Yes we can definetly do that. So to use coleasce verically you have to use LISTAGG which ignores nulls and concatenates the records ordered based on column you provide and then simply get the first record in LISTAGG output.

SELECT ID
    ,LISTAGG(TYPE,'_') within group(order by timestamp_)
    ,SPLIT(LISTAGG(start_,'_') within group(order by timestamp_),'_')[0]::INT start_
    ,SPLIT(LISTAGG(END_,'_') within group(order by timestamp_),'_')[0]::INT END_
FROM 
    (SELECT *
     FROM MY_TABLE
     QUALIFY ROW_NUMBER() OVER (PARTITION BY ID, TYPE ORDER BY TIMESTAMP_ DESC) = 1
    ) r
GROUP BY ID
ORDER BY ID