I was wondering whether someone knows if the dplyr extension packages (dbplyr and dtplyr) allow non-equi joins within the usual dplyr workflow? I rarely need data.table, but fast non-equi joins are the only moments where I always need to setDT, then perform the join and then switch back with as_tibble(). I scrolled through the issues in the packages on github but didn't find whether this is something that's planned or already implemented.
Non-equi join in tidyverse
2k Views Asked by b_surial At
3
There are 3 best solutions below
2
On
For dbplyr: While SQL supports non-equi joins, I have not found a dplyr approach that is equivalent. My usual work around is very similar to the r-bloggers link posted by @Waldi to join on the equality conditions and then filter on the inequality conditions.
For example:
output = join(df1, df2, by = c("df1_id" = "df2_id")) %>%
filter(df1_date <= df2_date)
This translates to SQL similar to:
SELECT *
FROM df1
JOIN df2
ON df1_id = df2_id
WHERE df1_date <= df2_date
Which is not that different from:
SELECT *
FROM df1
JOIN df2
ON df1_id = df2_id
AND df1_date <= df2_date
0
On
There has been a new option for this in dbplyr since the version 1.4.0: sql_on. Citing Kirill Müller:
There is #2240 for dplyr, but it's going to take a while. For databases we already have a workaround [ie. generic SQL joins].
library(dplyr)
library(dbplyr)
tbl1 <- memdb_frame(a = 1:3, b = 4:2)
tbl2 <- memdb_frame(c = 1:3, b = 2:0)
left_join(tbl1, tbl2, sql_on = "LHS.b < RHS.c")
Non-equi joins are available since
1.1.0via the functionjoin_by. To create non-equi joins, you can use<,>,>=,<=, or helpersbetween,within,overlapsandnearest.