I have two dataframes - df1 has 52000 rows, and df2 has 24000 rows
I need to work through each value in df1.column 2, ,and check row by row if it appears anywhere in df2.
If it does, then add the entire row from df2 into a new dataframe.
I have set up two dummy tables with small amounts of example data :
This is df1
| Year | Drink |
|---|---|
| 1985 | tea |
| 1935 | coffee |
| 2015 | beer |
| 2012 | wine |
| 2017 | tea |
| 1958 | soda |
This is df2
| Year | Country |
|---|---|
| 1985 | USA |
| 1955 | France |
| 2015 | China |
| 2011 | USA |
| 2017 | UK |
| 1958 | UK |
Step 1 - read col1 row 1 cell from df1 - it reads 1985.
Step 2 - work through df2 col1 row values in turn - if 1985 is there. Copy the entire row to a new dataframe. If not, ignore row and continue.
Repeat step 1 and Step 2 until end of all rows in df2.
I have tried:
YearComparison <- df1[df1$year %like% df2, ]
but I get the error:
Warning message: In grepl(pattern, vector, ignore.case = ignore.case, fixed = fixed) : argument 'pattern' has length > 1 and only the first element will be used
I also tried :
YearComparison <- df1[df1$year %like% df2,1 ]
which returned:
| Name | Type | Value |
|---|---|---|
| YearComparison | Double [0] |
I also tried:
YearComparison <- any(grepl('patientdata$status', countries$year,))
Which returned:
| Name | Type | Value |
|---|---|---|
| YearComparison | Logical[1] | False |
I have also tried variations using %in%, but with similar results.
Please remember in my actual data sets I have tens of thousands of rows, they are complex non-sequential strings (not dates - which I am just using here for ease to perfect the code) so something like:
YearComparison <- df1[df1$year %like% df2, c("1985", "1986","Etc"), ] isn't practical.
Can anyone help? Many thanks.
I guess, you need mutating join functions.
Extra information: Pls, be aware of the difference between
inner_join()andleft_join(),right_join(),full_join().