How to only include records prior/post a given row date in python, ideally using pandas transformations on a spreadsheet

72 Views Asked by At

I have an extract where I'm required to identify a certain type of surgery X seen in the Surg Type column.

I need to retain medical appointments seen as distinct rows over a window/period of time, 3 appointment's prior (-3, -2, -1) and 3 appointment's post (+1, +2, +3)

I must include this order as an additional column.

On top of this I need to exclude any appointments outside of the window and any other type of Surg Type, in this example any surgery's denoted Z.

In this example 7/9 rows/record's I want retained and an additional column Prior Post

*** UPDATED EXAMPLE ***

Original Df

| Patient ID | Surg ID | Surg Type | Surg Date  | Medical Appt | Medical Appt Date |
|------------|---------|-----------|------------|--------------|-------------------|
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-01-01        |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-03-04        |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-05-04        |
| 1          | 1       | X         | 2022-09-03 | N            | NaT               |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-11-04        |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-11-29        |
| 1          | 2       | Z         | 2022-12-01 | N            | NaT               |
| 1          | 1       | X         | 2022-09-03 | Y            | 2023-01-02        |
| 1          | 1       | X         | 2022-09-03 | Y            | 2023-01-13        |



Desired Df

| Patient ID | Surg ID | Surg Type | Surg Date  | Medical Appt | Medical Appt Date | Inclusion   |
|------------|---------|-----------|------------|--------------|-------------------|-------------|
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-01-01        | -3          |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-03-04        | -2          |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-05-04        | -1          |
| 1          | 1       | X         | 2022-09-03 | N            | NaT               |             |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-11-04        | +1          |
| 1          | 1       | X         | 2022-09-03 | Y            | 2022-11-29        | +2          |
| 1          | 2       | Z         | 2022-12-01 | N            | NaT               | Exclude Row |
| 1          | 1       | X         | 2022-09-03 | Y            | 2023-01-02        | +3          |
| 1          | 1       | X         | 2022-09-03 | Y            | 2023-01-13        | Exclude row |
1

There are 1 best solutions below

2
e-motta On
# Remove rows that are not Surgery X or appointment
df = df.loc[df["Surg Type"].eq("X") | df["Medical Appt"].eq("Y")].reset_index(drop=True)
# Fill Prior Post column, assuming only one row is of Surg Type X
surgery_idx = df[df["Surg Type"].eq("X")].index[0]
df["Prior Post"] = df.index - surgery_idx
# Remove rows where Prior Post is outside of limits
df = df.loc[(df["Prior Post"] >= -3) & (df["Prior Post"] <= 3)]