I am trying to combine two deedle dataframes in a way similar to merge in pandas.
I will have one bigger frame where there is a year and quarter column with other data beside it, and then a smaller frame with a year, quarter, and ad partner column. I want to add the ad partner to every instance of the same year and quarter in the larger table.
There is some sample data at the link below: https://datascienceparichay.com/article/pandas-merge-dataframes-on-multiple-columns/
Any help would be greatly appreciated!
Thanks!
EDIT
I now have this code, although it throws a keyNotFound exception if there is a key in df1 that is not in df2, and does not return a table.
var df2ByYQ = df2.IndexRowsUsing(row => (row.GetAs<string>("Year".Trim()), row.GetAs<string>("Quarter".Trim())));
var bCol = df1.Rows.SelectOptional(kvp => df2ByYQ.Rows[(kvp.Value.Value.GetAs<string>("Year".Trim()), kvp.Value.Value.GetAs<string>("Quarter".Trim()))].TryGetAs<int>("b"));`
You can do this by reindexing the two data frames so that they both have the same row index formed by a tuple consisting of the year and quarter. The easiest way to do this is to use the
Frame.indexRowsUsingfunction. For example, given two frames withacolumn in one andbin the other:Now you can use
jointo join the two frames - you just have to drop the columns that are shared in both of the frames (the operation does not allow overlapping columns):EDIT: In the scenario discussed in the comments, where
df1has rows with multiple records per year/quarter and we want to add information from a unique row in another data frame, I would reindex the other data frame and then use themapoperation to find the corresponding matching row in the other data frame (this will be quite fast, because lookup by index is done via a hashtable):