I want to allow the many-to-many relationship when merging two data sets. I set allow.cartesian = TRUE, but somehow it is not working as expected. Below are the exemplary codes:
df1 = data.frame(
keys = c('Walmart', 'Costco'),
x1 = c(1, 2),
unique_key1 = paste0('df1_', c(1:2))
)
df2 = data.frame(
keys = c('Walmar', 'Walmart 2', 'Costco1', 'Costco2'),
x2 = c(1:4),
unique_key2 = paste0('df2_', c(1:4))
)
When I use fedmatch::merge_plus,
fedmatch::merge_plus(
df1, df2,
by = c('keys'),
match_type = 'fuzzy',
unique_key_1 = "unique_key1",
unique_key_2 = "unique_key2",
fuzzy_settings = build_fuzzy_settings(maxDist = .5),
allow.cartesian = TRUE
)$matches
I expected that the result would look like this:
keys_1 keys_2 x1 x2 unique_key1 unique_key2
1 Walmart Walmar 1 1 df1_1 df2_1
2 Walmart Walmart 2 1 2 df1_1 df2_2
3 Costco Costco1 2 3 df1_2 df2_3
4 Costco Costco2 2 4 df1_2 df2_4
However, it actually looks like
unique_key2 unique_key1 x1 keys_1 keys_2 x2 tier
1: df2_1 df1_1 1 Walmart Walmar 1 all
2: df2_3 df1_2 2 Costco Costco1 3 all
I also adjusted the degree of distance, which did not change the result. Would it be possible to incorporate the many-to-many relationship in the result? Solutions using other packages are very welcome, too.