Merge two different dataframes when column with similar value has similar row value

59 Views Asked by At

The simple question We have two different dataframes with a similar setup as the output of DF_1 & DF_2

DF_1 = as.data.frame(matrix(c(1,2,3,4,20,55,12,44),nrow=4,ncol=2))
DF_2 = as.data.frame(matrix(c(1,2,3,4,33,32,21,44,66,71),nrow=2,ncol=5))

How does one merge DF_1 and DF_2 to create a combined dataframe call it combined_DF which looks like the output of this

combined_DF = as.data.frame(matrix(c(1,2,1,2,20,55,12,44,3,4,3,4,33,32,33,32,21,44,21,44,66,71,66,71),nrow=4,ncol=6))

What would be a fast way, which creates the same output as combined_DF?

The harder question In reality the two dataframes which needs to be merged is much bigger, in fact

DF1 is 156798 obs. of 9 variables

DF2 is 6441 obs. of 92 variables

Now how does one merge DF_1 and DF_2 to create a combined dataframe call it combined_DF which has 156798 obs. of 99 variables? (same as principle as the simple question, but this time much larger dataframes) Can one use doMC::registerDoMC to speed up this process.

Answers is much appreciated :)

2

There are 2 best solutions below

1
Sotos On BEST ANSWER

via base R,

 cbind(V6 = DF_1$V2, DF_2[rep(rownames(DF_2), nrow(DF_1) - nrow(DF_2)),])

#    V6 V1 V2 V3 V4 V5
#1   20  1  3 33 21 66
#2   55  2  4 32 44 71
#1.1 12  1  3 33 21 66
#2.1 44  2  4 32 44 71
0
Matt W. On

Not 100% on what you're asking. Either you want to just put the table side by side, in which is a column bind, or one below the other which is a row bind, or you're trying to join the tables which is a join.. Could you clarify a bit more? I don't understand your reproducible example.

via dplyr

library(dplyr)
combined_DF <- bind_cols(DF_1, DF_2)

or

combined_DF <- bind_rows(DF_1, DF_2)

or

combined_DF <- left_join(DF_1, DF_2)