I have two dataframes.
df1
col1 var1 var2 var3
X11 NA (for var3)
X12 NA (for var2)
X13 NA (for var1)
df1 has a few columns (float64 type representing some categories) like var1, var2, var3 with values between 1-5 for each and some missing values for the categories.
I want to fill in the missing values (in each of var1, var2, and var3 columns) using another dataframe, df2 such that df2 has a column with the value for the category.
df2
col1 col2 val col4
X11 var1 3 X11-X21
X12 var3 2 X21-X22
X13 var2 1 X13-X32
col4 is the concatenation of col1 and col2 but it did not help much.
How could I do this? Since we need to look up on several columns and also because of the structure of df1, I found it complicated to use pivot or melt or even one-hot encoding (produces 5 columns each with _1 to _5 suffixed. I also though about creating a set but then the pairs must be unique which is not the case. Same when I thought of using dictionary as I cannot think of unique keys.
How could I solve this issue?
Thanks.
The example below works with the small sample you provided.
The code goes through the rows of
df1, reading out each row into the variablerow. It then goes through the values (and column names) ofrow. When a value ispd.NA, it indexes intodf2based on the index ofrow, and returns thedf2["val"]at that index.