dplyr Find records that have specifc set of values

238 Views Asked by At

I have a dataset that has some ID and associated timepoints. I want to filter out IDs that have a specific combination of timepoints. If I filter using %in% or |, I get IDs out of the specific combination. How do I do this in R ?

ID Timepoint
1 1
1 6
1 12
2 1
3 1
3 6
3 12
3 18
4 1
4 6
4 12

I want to filter IDs that have timepoints 1,6 and 12 and exclude other IDs.

Result would be IDs 1,3 and 4

3

There are 3 best solutions below

1
Brigadeiro On
library(dplyr)

df <- data.frame(ID = c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4),
                 Timepoint = c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12))

df %>%
  filter(Timepoint %in% c(1, 6, 12)) %>%
  mutate(indicator = 1) %>%
  group_by(ID) %>%
  complete(Timepoint = c(1, 6, 12)) %>%
  filter(!ID %in% pull(filter(., is.na(indicator)), ID)) %>%
  select(indicator)

Output:

# A tibble: 9 × 2
# Groups:   ID [3]
     ID indicator
  <dbl>     <dbl>
1     1         1
2     1         1
3     1         1
4     3         1
5     3         1
6     3         1
7     4         1
8     4         1
9     4         1
0
ccommjin On

From your data, ID 2 has time point 1. So if filter by time points 1, 6, 12, the result will be 1, 2, 3, 4 instead of 1, 3, 4.

ids <- c(1, 1, 1, 2, 3, 3, 3, 3, 4, 4, 4)
time_points <- c(1, 6, 12, 1, 1, 6, 12, 18, 1, 6, 12)
dat <- data.frame(ids, time_points)
unique(dat$ids[dat$time_points %in% c(1, 6, 12)])
1
akrun On

We can use

library(dplyr)
df %>% 
    group_by(ID) %>% 
    filter(all(c(1, 6, 12) %in% Timepoint)) %>% 
    ungroup

-output

# A tibble: 10 x 2
      ID Timepoint
   <dbl>     <dbl>
 1     1         1
 2     1         6
 3     1        12
 4     3         1
 5     3         6
 6     3        12
 7     3        18
 8     4         1
 9     4         6
10     4        12