How to select a column for a where statement based on another column value in SQL?

30 Views Asked by At

I'm currently beginning a project for which i need to do some data management in SQL.

I want to check wether i have enough history for an individual to be included in the analysis I extracted data in a table that contains a column 'index_date' (date datatype) that correspond to the index date and multiple columns 'info_AAAA' that I use to state whether the individual figures on the database the year AAAA

i want to select only patients that figures in the database at least for the year AAAA, AAAA-1 and AAAA-2

But as I'm an absolute beginner in sql i can't figure how to do this

2

There are 2 best solutions below

1
Littlefoot On

I have a column 'index_date' in my dataset that correspond to the index date

That's kind of self-descriptive, but - what is "index date"? What does it represent? Is it relevant for this case?


and multiple columns 'presence_AAAA' stating wether the individual figures on the database the year AAAA

Now that sounds scary! Are you saying that table actually looks like this?

Name    Presence_2021   Presence_2022   Presence_2023
-----   -------------   -------------   -------------
Scott   yes             yes             no

If so, you're doing it wrong. Table should have only one presence_year column, e.g.

Name    Presence_year   Present
-----   -------------   -------
Scott   2021            yes
Scott   2022            yes
Scott   2023            no

because - what will you do next year? Add yet another column and modify ALL queries you wrote so far in order to include newly added column? And then repeat it the next year, and the year after, and ...? That's insane, don't do that.


Anyway: 3 years back can be calculated; question is what you really have in that table.

If it were my table, you'd e.g.

select *
from that_table
where presence_year >= extract(year from sysdate) - 2

If you want to relate it to index_date (presuming that it is a DATE datatype column), you'd

select *
from that_table
where index_date >= add_months(trunc(sysdate, 'yyyy'), -2 * 12)

Other options are also possible; again, depending on what you really have.

0
MT0 On

I want to check whether I have enough history for an individual to be included in the analysis. I have a column index_date in my dataset that correspond to the index date and multiple columns presence_AAAA stating whether the individual figures on the database the year AAAA.

Let us assume you have a table containing the individual data-points:

CREATE TABLE patient_data (
  patient_id NUMBER,
  index_date DATE,
  some_data  VARCHAR2(100),
  other_data VARCHAR2(100)
);

Then you can generate your history using conditional aggregation in a view:

CREATE VIEW patient_year_data (patient_id, presence_2022, presence_2023, presence_2024) AS
SELECT patient_id,
       CASE
       WHEN COUNT(
              CASE
              WHEN index_date >= DATE '2022-01-01' AND index_date < DATE '2023-01-01'
              THEN 1
              END
            ) > 0
       THEN 'Yes'
       ELSE 'No'
       END,
       CASE
       WHEN COUNT(
              CASE
              WHEN index_date >= DATE '2023-01-01' AND index_date < DATE '2024-01-01'
              THEN 1
              END
            ) > 0
       THEN 'Yes'
       ELSE 'No'
       END,
       CASE
       WHEN COUNT(
              CASE
              WHEN index_date >= DATE '2024-01-01' AND index_date < DATE '2025-01-01'
              THEN 1
              END
            ) > 0
       THEN 'Yes'
       ELSE 'No'
       END
FROM   patient_data
WHERE  index_date >= DATE '2022-01-01'
AND    index_date <  DATE '2025-01-01'
GROUP BY patient_id

I want to select only patients that figures in the database at least for the year AAAA, AAAA-1 and AAAA-2

Then you can use:

SELECT patient_id
FROM   patient_year_data
WHERE  presence_2024 = 'Yes'
AND    presence_2023 = 'Yes'
AND    presence_2022 = 'Yes'

However, you don't need the view and could directly use the underlying table:

SELECT patient_id
FROM   patient_data
GROUP BY patient_id
HAVING  COUNT(
          CASE
          WHEN index_date >= DATE '2022-01-01' AND index_date < DATE '2023-01-01'
          THEN 1
          END
        ) > 0
AND     COUNT(
          CASE
          WHEN index_date >= DATE '2023-01-01' AND index_date < DATE '2024-01-01'
          THEN 1
          END
        ) > 0
AND     COUNT(
          CASE
          WHEN index_date >= DATE '2024-01-01' AND index_date < DATE '2025-01-01'
          THEN 1
          END
        ) > 0