allow.cartesian in merge_plus from fedmatch does not work as expected

56 Views Asked by At

I want to allow the many-to-many relationship when merging two data sets. I set allow.cartesian = TRUE, but somehow it is not working as expected. Below are the exemplary codes:

df1 = data.frame(
  keys = c('Walmart', 'Costco'),
  x1 = c(1, 2),
  unique_key1 = paste0('df1_', c(1:2))
)

df2 = data.frame(
  keys = c('Walmar', 'Walmart 2', 'Costco1', 'Costco2'),
  x2 = c(1:4),
  unique_key2 = paste0('df2_', c(1:4))
)

When I use fedmatch::merge_plus,

fedmatch::merge_plus(
  df1, df2,
  by = c('keys'),
  match_type = 'fuzzy',
  unique_key_1 = "unique_key1",
  unique_key_2 = "unique_key2",
  fuzzy_settings = build_fuzzy_settings(maxDist = .5),
  allow.cartesian = TRUE
)$matches

I expected that the result would look like this:

   keys_1    keys_2 x1 x2 unique_key1 unique_key2
1 Walmart    Walmar  1  1       df1_1       df2_1
2 Walmart Walmart 2  1  2       df1_1       df2_2
3  Costco   Costco1  2  3       df1_2       df2_3
4  Costco   Costco2  2  4       df1_2       df2_4

However, it actually looks like

   unique_key2 unique_key1 x1  keys_1  keys_2 x2 tier
1:       df2_1       df1_1  1 Walmart  Walmar  1  all
2:       df2_3       df1_2  2  Costco Costco1  3  all

I also adjusted the degree of distance, which did not change the result. Would it be possible to incorporate the many-to-many relationship in the result? Solutions using other packages are very welcome, too.

0

There are 0 best solutions below