Filtering dataset with multiple conditions for monotherapy

55 Views Asked by At

I have this dataset which contains patient_id, diagnosis_date, diagnosis, trt_date, and drug_code.

# Create the dataset
data <- data.frame(
  patient_id = c(1, 1, 1, 1, 5, 5, 7, 7),
  diagnosis_date = as.Date(c("1/9/10", "1/9/10", "1/9/10", 
                             "1/9/10", "1/11/10", "1/11/10", 
                             "1/9/10", "1/9/10"), format = "%m/%d/%y"),
  diagnosis = c("breast cancer", "breast cancer", "breast cancer", 
                "breast cancer", "breast cancer", "breast cancer", 
                "breast cancer", "breast cancer"),
  trt_date = as.Date(c("1/20/10", "1/20/10", "1/21/10", 
                       "1/21/10", "1/29/10", "1/30/10", 
                       "1/25/10", "1/26/10"), format = "%m/%d/%y"),
  drug_code = c("A", "B", "A", "A", "B", "A", "A", "A")
)

Here are the conditions for filtering:

  • Patient id 1, has first treatment date of 1/20/10 and received drug A, but also received drug B on the same date (second line of observation). This is called first line combo therapy with drug A and B, which we will need to filter out . So patient id 1 will be filtered out.

  • Patient id 5, has first treatment date of 1/29/10 and received only drug B, this is called first line monotherapy with drug B. So patient id 5 will be kept.

  • Patient id 7 has first treatment date of 1/25/10 and received only drug A, this is called first line monotherapy with drug A.

As a result, I want observations that only have first line monotherapy with drug A and first line monotherapy with drug B, but not first line combo therapy with drug A and B. So patient id 7 will be kept.

So the desired output would look like:

patient_id    diagnosis_date     diagnosis          trt_date    drug_code
5             2010-01-11         breast cancer      2010-01-29  B
5             2010-01-11         breast cancer      2010-01-30  A
7             2010-01-09         breast cancer      2010-01-25  A
7             2010-01-09         breast cancer      2010-01-26  A

How can I achieve this?

1

There are 1 best solutions below

0
M-- On
library(dplyr)

data %>% 
  filter(lapply(list(drug_code), length) == 1, .by = c(patient_id, trt_date))

#>   patient_id diagnosis_date     diagnosis   trt_date drug_code
#> 1          5     2010-01-11 breast cancer 2010-01-29         B
#> 2          5     2010-01-11 breast cancer 2010-01-30         A
#> 3          7     2010-01-09 breast cancer 2010-01-25         A
#> 4          7     2010-01-09 breast cancer 2010-01-26         A