Count rows for each ID that fall between two dates, when using two tables

290 Views Asked by JH2021 At 14 February 2023 at 12:32

For each ID in one table (Table 1), how can I count the number of rows in another table (Table 2) which fall between a start date and end date (two columns in the first table) in R?

Table 1:

contact_number	start_date	end_date
123	01/01/2023	01/01/2024
1234	01/01/2022	01/01/2023

Table 2:

contact_number	visit_type	date
123	1	02/01/2023
123	2	05/01/2023
1234	2	05/01/2022

Output wanted:

contact_number	start_date	end_date	visits
123	01/01/2023	01/01/2024	2
1234	01/01/2022	01/01/2023	1

I am wanting to replicate a Power BI DAX query:

visits_per_membership = 
CALCULATE(
    COUNTROWS('Table 2'),
    FILTER(
        'Table 2',
        'Table 2'[contact_number] = Table 1[contact_number] && 'Table 2'[EntryDate].[Date] >= Table 1[start_date].[Date] && 'Table 2'[EntryDate].[Date] < Table 1[renewal_date].[Date])
)

Original Q&A

There are 2 best solutions below

zephryl On 14 February 2023 at 13:09

Base R solution, mapping over rows of table1 and comparing to table2 by id and dates:

table1$visits <- mapply(
  \(contact, start, end) sum(
    table2$contact_number == contact & 
    table2$date >= start &
    table2$date <= end
  ),
  table1$contact_number,
  table1$start_date,
  table1$end_date
  )

table1

  contact_number start_date   end_date visits
1            123 2023-01-01 2024-01-01      2
2           1234 2022-01-01 2023-01-01      1

Or using dplyr::left_join():

library(dplyr) # >= v1.1.0:

table1 %>%
  left_join(table2, multiple = "all") %>%
  summarize(
    visits = sum(between(date, start_date, end_date)),
    .by = c(contact_number, start_date, end_date)
  )

Wimpel On 14 February 2023 at 13:22

data.table approach

dt1[, visits := dt2[dt1, on = .(date >= start_date, date <= end_date)][, .N, by = contact_number]$N]

   contact_number start_date   end_date visits
1:            123 2023-01-01 2024-01-01      2
2:           1234 2022-01-01 2023-01-01      1

sample data used

dt1 <- fread("contact_number    start_date  end_date
123     01/01/2023  01/01/2024
1234    01/01/2022  01/01/2023")

dt2 <- fread("contact_number    visit_type  date
123     1   02/01/2023
123     2   05/01/2023
1234    2   05/01/2022")

dt1[, c("start_date", "end_date") := lapply(.SD, as.Date, format = "%d/%m/%Y"), 
    .SDcols = c("start_date", "end_date")]
dt2[, date := as.Date(date, format = "%d/%m/%Y")]

Count rows for each ID that fall between two dates, when using two tables

There are 2 best solutions below

Related Questions in R

Related Questions in COUNT

Related Questions in POWERBI

Related Questions in ROWCOUNT

Trending Questions

Popular # Hahtags

Popular Questions