df = pd.DataFrame([[100,90,80,70,36,45], [101,78,65,88,55,78], [92,77,42,79,43,32], [103,98,76,54,45,65]], index = pd.date_range(start='2022-01-01' ,periods=4))
df.columns = pd.MultiIndex.from_tuples((("mkf", "Open"), ("mkf", "Close"), ("tdf", "Open"), ("tdf","Close"), ("ghi","Open"), ("ghi", "Close")))
df
mkf tdf ghi
Open Close Open Close Open Close
2022-01-01 100 90 80 70 36 45
2022-01-02 101 78 65 88 55 78
2022-01-03 92 77 42 79 43 32
2022-01-04 103 98 76 54 45 65
The above data frame is composed of three smaller data frames that are identical in structure (i.e. same columns and indices). You can think of it as a large data frame that has been concatenated from three smaller ones. In reality there are thousands of these smaller df's concatenated together but for simplicity I've kept it at three.
I'd like to be able to use the split-apply-combine principle to create a new column called r in each of the three sub data frames.
The split would be based on df.columns.levels[0].
The apply would be np.log(['Close']).diff()
The combined data frame would produce the below:
mkf tdf ghi
Open Close r Open Close r Open Close r
2022-01-01 100 90 NaN 80 70 NaN 36 45 NaN
2022-01-02 101 78 -0.143101 65 88 0.228842 55 78 0.550046
2022-01-03 92 77 -0.012903 42 79 -0.107889 43 32 -0.890973
2022-01-04 103 98 0.241162 76 54 -0.380464 45 65 0.708651
You could use:
or with a
groupbyapproach:output: