Create dummy column that indicates when something happened between an interval

Question

Create dummy column that indicates when something happened between an interval

88 Views Asked by geek45 At 26 January 2024 at 12:00

I have two tables which consist of the following columns:

Table1:

ID
N which describes an ID who did something, e.g. for the first time, then a second time, etc.
START_DATE is related to column N. It describes the date when an ID did something.
STOP_DATE 1 day before column GROUPS changes from 0 to 1.
GROUPS is a grouping variable.

Table2:

ID
EVENT_HAPPENED is a date on which an event happened.

db<>fiddle

Table1:

ID	N	START_DATE	STOP_DATE	GROUPS
1	1	2023-01-26	2023-04-13	0
1	2	2023-02-05	2023-04-13	0
1	3	2023-02-27	2023-04-13	0
1	4	2023-04-14	2023-09-13	1
1	5	2023-04-14	2023-09-13	1
1	6	2023-04-14	2023-09-13	1

Table2:

ID	EVENT_HAPPENED
1	2023-03-12
1	2023-03-18
1	2023-04-07
1	2023-05-31

So what I need to do is to check the minimum date of EVENT_HAPPENED. In this case, it is 2023-03-12 and then I reference it to table1 to create a column called MARKER which indicates with 1 that for all subsequent data entries for this ID, the event has already happened. I also need to associate the smallest date on which the EVENT_HAPPENED occurred with the corresponding interval between START_DATE and STOP_DATE as a new column called HAPPENED_FIRST_TIME with the corresponding date as column HAPPENED_ON.

What I expect

ID	N	START_DATE	STOP_DATE	GROUPS	MARKER	HAPPENED_FIRST_TIME	HAPPENED_ON
1	1	2023-01-26	2023-04-13	0	0	0	null
1	2	2023-02-05	2023-04-13	0	0	0	null
1	3	2023-02-27	2023-04-13	0	0	1	2023-03-12
1	4	2023-04-14	2023-09-13	1	1	0	null
1	5	2023-04-14	2023-09-13	1	1	0	null
1	6	2023-04-14	2023-09-13	1	1	0	null

What I tried to do is the following:

WITH EventDates AS (
    SELECT
        ID,
        MIN(EVENT_HAPPENED) AS MIN_EVENT_DATE
    FROM
        Table2
    GROUP BY
        ID
),
IntervalDates AS (
    SELECT
        t1.ID,
        t1.N,
        t1.START_DATE,
        t1.STOP_DATE,
        t1.GROUPS,
        ed.MIN_EVENT_DATE AS HAPPENED_FIRST_TIME
    FROM
        Table1 t1
    LEFT JOIN
        EventDates ed ON t1.ID = ed.ID
)
SELECT
    ID,
    N,
    START_DATE,
    STOP_DATE,
    GROUPS,
    CASE
        WHEN HAPPENED_FIRST_TIME <= START_DATE THEN 1
        ELSE 0
    END AS MARKER,
    CASE
        WHEN HAPPENED_FIRST_TIME BETWEEN START_DATE AND STOP_DATE THEN 1
        ELSE 0
    END AS HAPPENED_FIRST_TIME,
    CASE
        WHEN HAPPENED_FIRST_TIME BETWEEN START_DATE AND STOP_DATE THEN HAPPENED_FIRST_TIME
        ELSE NULL
    END AS HAPPENED_ON
FROM
    IntervalDates;

The problem occurs here because the smallest EVENT_HAPPENED date is actually between START_DATE and STOP_DATE for the first 3 data entries. However, it happened after START_DATE 2023-02-27:

    CASE
        WHEN HAPPENED_FIRST_TIME BETWEEN START_DATE AND STOP_DATE THEN 1
        ELSE 0
    END AS HAPPENED_FIRST_TIME,
    CASE
        WHEN HAPPENED_FIRST_TIME BETWEEN START_DATE AND STOP_DATE THEN HAPPENED_FIRST_TIME
        ELSE NULL
    END AS HAPPENED_ON

which gives the following result:

ID	N	START_DATE	STOP_DATE	GROUPS	MARKER	HAPPENED_FIRST_TIME	HAPPENED_ON
1	1	2023-01-26	2023-04-13	0	0	1	2023-03-12
1	2	2023-02-05	2023-04-13	0	0	1	2023-03-12
1	3	2023-02-27	2023-04-13	0	0	1	2023-03-12
1	4	2023-04-14	2023-09-13	1	1	0	null
1	5	2023-04-14	2023-09-13	1	1	0	null
1	6	2023-04-14	2023-09-13	1	1	0	null

How can I adjust my code?

Thank you

------------------EDIT
When adding IDs who ever belong to GROUPS = 0 the MARKER column works fine, but the HAPPENED_FIRST_TIME indicator does not work nor the corresponding HAPPENED_ON.

Do I need to split IDs into those who belong to GROUPS = 1 and those who never do to run the RN?

Expected output:

ID	N	START_DATE	STOP_DATE	GROUPS	MARKER	HAPPENED_FIRST_TIME	HAPPENED_ON
1	1	2023-01-26	2023-04-13	0	0	0	null
1	2	2023-02-05	2023-04-13	0	0	0	null
1	3	2023-02-27	2023-04-13	0	0	1	2023-03-12
1	4	2023-04-14	2023-09-13	1	1	0	null
1	5	2023-04-14	2023-09-13	1	1	0	null
1	6	2023-04-14	2023-09-13	1	1	0	null
3	1	2023-01-26	2023-04-13	0	0	1	2023-01-31
3	2	2023-02-05	2023-04-13	0	1	0	null
3	3	2023-02-27	2023-04-13	0	1	0	null
3	4	2023-04-14	2023-04-13	0	1	0	null

Original Q&A

There are 1 best solutions below

**SelVazi** · Accepted Answer · 2024-01-26T13:23:36.053000

I have slightly updated your query to retrieve the correct HAPPENED_FIRST_TIME, I used the window function row_number() to pinpoint the nearest START_DATE relative to the event date :

WITH EventDates AS (
    SELECT
        ID,
        MIN(EVENT_HAPPENED) AS MIN_EVENT_DATE
    FROM
        Table2
    GROUP BY
        ID
),
IntervalDates AS (
    SELECT
        t1.ID,
        t1.N,
        t1.START_DATE,
        t1.STOP_DATE,
        t1.GROUPS,
        ed.MIN_EVENT_DATE AS HAPPENED_FIRST_TIME,
        ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY t1.N DESC) 
        - ROW_NUMBER() OVER (PARTITION BY t1.ID, t1.GROUPS ORDER BY t1.N DESC) AS grp
    FROM
        Table1 t1
    LEFT JOIN
        EventDates ed ON t1.ID = ed.ID
),
closest_dates AS (
  SELECT ID, N, ROW_NUMBER() OVER (PARTITION BY ID, grp ORDER BY N DESC) AS RN
  FROM IntervalDates
  WHERE HAPPENED_FIRST_TIME BETWEEN START_DATE AND STOP_DATE
)
SELECT i.ID,
       i.N,
       i.START_DATE,
       i.STOP_DATE,
       i.GROUPS,
       CASE
          WHEN i.HAPPENED_FIRST_TIME <= i.START_DATE THEN 1
          ELSE 0
       END AS MARKER,
       CASE
         WHEN RN = 1 AND c.ID IS NOT NULL THEN 1
         ELSE 0
       END AS HAPPENED_FIRST_TIME,
       CASE
          WHEN RN = 1 AND c.ID IS NOT NULL THEN HAPPENED_FIRST_TIME
          ELSE NULL
       END AS HAPPENED_ON
FROM IntervalDates i
LEFT JOIN closest_dates c ON i.ID = c.ID and i.N = c.N

Results :

ID  N   START_DATE  STOP_DATE   GROUPS  MARKER  FIRST_TIME  HAPPENED_ON
1   1   2023-01-26  2023-04-13  0       0       0           null
1   2   2023-02-05  2023-04-13  0       0       0           null
1   3   2023-02-27  2023-04-13  0       0       1           2023-03-12
1   4   2023-04-14  2023-09-13  1       1       0           null
1   5   2023-04-14  2023-09-13  1       1       0           null
1   6   2023-04-14  2023-09-13  1       1       0           null

Demo here

Based on the recent simple data you provided in the comments, it appears that the question has transformed into a "gaps and islands" problem . This can be resolved using the difference between two row_numbers to identify the existing groups :

WITH EventDates AS (
    SELECT ID, MIN(EVENT_HAPPENED) AS MIN_EVENT_DATE
    FROM Table2
    GROUP BY ID
),
IntervalDates AS (
    SELECT t1.ID, t1.N, t1.START_DATE, t1.STOP_DATE, t1.GROUPS,
        ed.MIN_EVENT_DATE AS HAPPENED_FIRST_TIME,
        ROW_NUMBER() OVER (PARTITION BY t1.ID ORDER BY t1.N DESC) 
        - ROW_NUMBER() OVER (PARTITION BY t1.ID, t1.GROUPS ORDER BY t1.N DESC) AS grp
    FROM Table1 t1
    LEFT JOIN EventDates ed ON t1.ID = ed.ID
),
closest_dates AS (
  SELECT ID, N, ROW_NUMBER() OVER (PARTITION BY ID, grp ORDER BY N DESC) AS RN
  FROM IntervalDates
  WHERE HAPPENED_FIRST_TIME BETWEEN START_DATE AND STOP_DATE
)
SELECT i.ID, i.N, i.START_DATE, i.STOP_DATE, i.GROUPS,
       CASE
          WHEN i.HAPPENED_FIRST_TIME <= i.START_DATE THEN 1
          ELSE 0
       END AS MARKER,
       CASE
         WHEN RN = 1 AND c.ID IS NOT NULL THEN 1
         ELSE 0
       END AS HAPPENED_FIRST_TIME,
       CASE
          WHEN RN = 1 AND c.ID IS NOT NULL THEN HAPPENED_FIRST_TIME
          ELSE NULL
       END AS HAPPENED_ON
FROM IntervalDates i
LEFT JOIN closest_dates c ON i.ID = c.ID and i.N = c.N

Demo here

Create dummy column that indicates when something happened between an interval

There are 1 best solutions below

Related Questions in SQL

Related Questions in SQL-SERVER

Related Questions in GAPS-AND-ISLANDS

Trending Questions

Popular # Hahtags

Popular Questions