Apply criteria from one table against each row in another table to generate a list of matches

Question

Apply criteria from one table against each row in another table to generate a list of matches

32 Views Asked by pkataras At 26 March 2024 at 20:53

I have 2 tables of data. 1 table has a list of start and end date ranges. The other table a list of students with their class and start dates. For each of the date ranges, I'm trying to list the students who have a start date in that range.

Range Data

Start	End
10/1/2023	12/31/2023
7/1/2023	9/30/2023
4/1/2023	6/30/2023
1/1/2023	3/31/2023

Student

Learner	Course	Start
1	English	11/5/23
1	Math	7/10/23
2	English	7/25/23
2	Math	5/15/23
3	Science	4/25/23
3	Math	7/25/23
3	English	11/15/23
4	Science	11/5/23
4	Math	1/5/23

If I only have a single date range to filter against, I can do this using filter() to filter by start dates, but I have a dozen different ranges and don't want to run in manually for each range. My intended outcome is a list of ranges with the students having a class in that range

Start	End	Learner
10/1/2023	12/31/2023	1
10/1/2023	12/31/2023	3
10/1/2023	12/31/2023	4
7/1/2023	9/30/2023	1
7/1/2023	9/30/2023	2
7/1/2023	9/30/2023	3
4/1/2023	6/30/2023	2
4/1/2023	6/30/2023	3
1/1/2023	3/31/2023	4

Original Q&A

There are 3 best solutions below

**Onyambu** · Answer 1 · 2024-03-26T21:06:53.457000

mutate(df1, across(everything(), mdy)) %>%
   right_join(mutate(df2, Start = mdy(Start)), 
             by = join_by(Start <= Start , End>Start), 
             suffix = c('', '_y'))%>%
   select(-Start_y, -Course)

       Start        End Learner
1 2023-10-01 2023-12-31       1
2 2023-10-01 2023-12-31       3
3 2023-10-01 2023-12-31       4
4 2023-07-01 2023-09-30       1
5 2023-07-01 2023-09-30       2
6 2023-07-01 2023-09-30       3
7 2023-04-01 2023-06-30       2
8 2023-04-01 2023-06-30       3
9 2023-01-01 2023-03-31       4

**Andre Wildberg** · Answer 2 · 2024-03-26T22:12:39.490000

A base R approach. The tricky part here is not finding the matches but getting the desired format of repeated dates in separate rows.

data.frame(do.call(rbind, 
  apply(ranges, 1, \(x) 
    t(sapply(which(
      as.Date(student$Start, "%m/%d/%y") >= as.Date(x["Start"], "%m/%d/%Y") & 
      as.Date(student$Start, "%m/%d/%y") <= as.Date(x["End"], "%m/%d/%Y")), \(y)
        c(x, Learner=student$Learner[y]))))))
      Start        End Learner
1 10/1/2023 12/31/2023       1
2 10/1/2023 12/31/2023       3
3 10/1/2023 12/31/2023       4
4  7/1/2023  9/30/2023       1
5  7/1/2023  9/30/2023       2
6  7/1/2023  9/30/2023       3
7  4/1/2023  6/30/2023       2
8  4/1/2023  6/30/2023       3
9  1/1/2023  3/31/2023       4

Note: Be careful with years in dates. 2-digits years need %y, 4-digit years %Y.

Data

ranges <- structure(list(Start = c("10/1/2023", "7/1/2023", "4/1/2023", 
"1/1/2023"), End = c("12/31/2023", "9/30/2023", "6/30/2023", 
"3/31/2023")), class = "data.frame", row.names = c(NA, -4L))

student <- structure(list(Learner = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L
), Course = c("English", "Math", "English", "Math", "Science", 
"Math", "English", "Science", "Math"), Start = c("11/5/23", "7/10/23", 
"7/25/23", "5/15/23", "4/25/23", "7/25/23", "11/15/23", "11/5/23", 
"1/5/23")), class = "data.frame", row.names = c(NA, -9L))

**jay.sf** · Answer 3 · 2024-03-27T06:47:55.123000

Write a function f that compares a range with a student's start date and loop over all combinations. In such double-for-loop situations we may use outer which gives a boolean matrix where we want max.col. Finally merge.

First however, care for your strange date formats and rename student's Start to Start_stud to have unique column names.

> ranges[] <- lapply(ranges, as.Date, '%m/%d/%Y')
> student <- transform(student, Start_stud=as.Date(student$Start, '%m/%d/%y'), Start=NULL)
> f <- \(x, y) student$Start[x] >= ranges[y, 1] & student$Start[x] <= ranges[y, 2]
> student$id <- max.col(outer(seq_len(nrow(student)), seq_len(nrow(ranges)), f))
> merge(ranges, student, by.x=0, by.y='id') |> {\(.) .[order(.[, 2]), -1]}()
       Start        End Learner  Course Start_stud
9 2023-01-01 2023-03-31       4    Math 2023-01-05
7 2023-04-01 2023-06-30       3 Science 2023-04-25
8 2023-04-01 2023-06-30       2    Math 2023-05-15
4 2023-07-01 2023-09-30       1    Math 2023-07-10
5 2023-07-01 2023-09-30       3    Math 2023-07-25
6 2023-07-01 2023-09-30       2 English 2023-07-25
1 2023-10-01 2023-12-31       1 English 2023-11-05
2 2023-10-01 2023-12-31       3 English 2023-11-15
3 2023-10-01 2023-12-31       4 Science 2023-11-05

If you really want just the first three columns, you can pipe in another |> subset(select=1:3).

Data:

> dput(ranges)
structure(list(Start = c("10/1/2023", "7/1/2023", "4/1/2023", 
"1/1/2023"), End = c("12/31/2023", "9/30/2023", "6/30/2023", 
"3/31/2023")), class = "data.frame", row.names = c(NA, -4L))
> dput(student)
structure(list(Learner = c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 4L
), Course = c("English", "Math", "English", "Math", "Science", 
"Math", "English", "Science", "Math"), Start = c("11/5/23", "7/10/23", 
"7/25/23", "5/15/23", "4/25/23", "7/25/23", "11/15/23", "11/5/23", 
"1/5/23")), class = "data.frame", row.names = c(NA, -9L))

Apply criteria from one table against each row in another table to generate a list of matches

There are 3 best solutions below

Data

Related Questions in R

Related Questions in FUNCTION

Related Questions in LOOPS

Related Questions in VECTOR

Trending Questions

Popular # Hahtags

Popular Questions