I have two data frames that I wish to merge into one based on the column "Species"
Dataframe 1:
| Species |
|---|
| Themeda australis |
| Themeda avenacea |
| Themeda quadrivalvis |
| Themeda triandra |
| Lomandra juncea |
| Lomandra longifolia |
Dataframe 2:
| Species | Common_name |
|---|---|
| Themeda australis (R.Br.) Stapf | Kangaroo grass |
| Themeda avenacea (F.Muell.) Lugger | Native oat |
| Themeda quadrivalvis (L.) Kuntze. | Grader grass |
| Themeda triandra Forssk. | Kangaroo grass |
| Lomandra juncea (F.Muell.) Ewart | Desert Mat-rush |
| Lomandra longifolia Labill. | Spiny-headed Mat-rush |
df1 <- data.frame(Species = c(
"Themeda australis",
"Themeda avenacea",
"Themeda quadrivalvis",
"Themeda triandra",
"Lomandra juncea",
"Lomandra longifolia"
))
df2 <- data.frame(Species = c(
"Themeda australis (R.Br.) Stapf",
"Themeda avenacea (F.Muell.) Lugger",
"Themeda quadrivalvis (L.) Kuntze.",
"Themeda triandra Forssk.",
"Lomandra juncea (F.Muell.) Ewart",
"Lomandra longifolia Labill."
), Common_name = c(
"Kangaroo grass",
"Native oat",
"Grader grass",
"Kangaroo grass",
"Desert Mat-rush",
"Spiny-headed Mat-rush"
))
However, because there are multiple species containing the same string, I would like to match the data frames by only the first two words of column Species (e.g. Themeda triandra == Themeda triandra Forssk.). Keeping in mind that I am working with big data: Dataframe 1 is 32,931 rows and Dataframe 2 is 16,185 rows. No matches can be denoted NA.
Desired output:
| Species | Common_name |
|---|---|
| Themeda australis | Kangaroo grass |
| Themeda avenacea | Native oat |
| Themeda quadrivalvis | Grader grass |
| Themeda triandra | Kangaroo grass |
| Lomandra juncea | Desert Mat-rush |
| Lomandra longifolia | Spiny-headed Mat-rush |
Is this possible?
I have tried the following:
output <- df1 %>%
fuzzy_inner_join(df2, by = "Species", match_fun = str_detect)
You can try this:
Result:
Actually it seems there is no need for join. We can just extract the first two words from
Speciescolumn. Anyway You can (simple, not fuzzy) join this resulting data.frame with any other data.frame having two-word Species column.