Is there a way to add a LIKE in a dbplyr inner_join SQL translation?

48 Views Asked by At

This is a follow-up question to this one: How to join dataframes using a prefix as a match?

I have those two tables on a SQL server side:

data <- memdb_frame(
  x = c("ANOTHER", "COMMON", "ZEBRA")
)

selection <- memdb_frame(
  x_prefix = c("ANO", "B", "CO"),
  type = c("One type", "Other type", "Other type")
)

I want to join them and keep lines where the x_prefix is a prefix of x.

I am trying to build the appropriate SQL query with dbplyr, starting from here:

inner_join(data, selection, by = c(x = "x_prefix")) |>
  show_query()

which produces:

<SQL>
SELECT `x`, `type`
FROM `dbplyr_001` AS `LHS`
INNER JOIN `dbplyr_002` AS `RHS`
ON (`LHS`.`x` = `RHS`.`x_prefix`)

The query I need is:

<SQL>
SELECT `x`, `type`
FROM `dbplyr_001` AS `LHS`
INNER JOIN `dbplyr_002` AS `RHS`
ON (`LHS`.`x` LIKE `RHS`.`x_prefix` || '%')

I have read that Function translation vignette but I still need help.

2

There are 2 best solutions below

0
Simon.S.A. On BEST ANSWER

Another way to attempt this is using a cross-join and filter. I use this approach with dbplyr, because there is less flexibility for specifying joins in R than there is in SQL.

# add placeholder column of ones to enable cross-join
data = data %>%
  mutate(ones = 1)
selection = selection %>%
  mutate(ones = 1)
# not necessary if there is another column you already intend to join by

cross_join = inner_join(data, selection, by = "ones") %>%
  select(-ones)

result = cross_join %>%
  filter(x_prefix == LEFT(x, nchar(x_prefix)))

I have used LEFT in capitals for it to be passed untranslated. There is a translation for substr but it would only accept numbers as inputs (and errored if I tried to use nchar(x_prefix) as the number of characters).

Another option for the filter would be to provide the condition as text within sql() to pass SQL code directly:

result = cross_join %>%
  filter(sql("x LIKE x_prefix || '%'"))
0
Nir Graham On

If you’re already aware of the specific SQL query you need, it might be more efficient to execute it directly before returning to your dbplyr operations.

library(dbplyr)
library(dplyr)
library(DBI)
# Create a connection
con <- dbConnect(RSQLite::SQLite(), ":memory:")


copy_to(con,tibble(
  x = c("ANOTHER", "COMMON", "ZEBRA")
),name="d1")

copy_to(con,tibble(
  x_prefix = c("ANO", "B", "CO"),
  type = c("One type", "Other type", "Other type")
),name="selection")
 
dbExecute(conn=con,
           statement = "create table result as
           select x, type from d1 as LHS 
              inner join selection as RHS
               on (LHS.x LIKE RHS.x_prefix || '%')")
 
 
my_res_dtplyr_table <- tbl(con,"result") 

#dtplyering with the result
my_res_dtplyr_table |> select(type) 
my_res_dtplyr_table |> filter(x != "COMMON")