Simlarly to this post: efficient function to find harmonic mean across different pandas dataframes I have two Pandas dataframes that are identical in shape and I want to find the harmonic mean of each pair of elements - one from each dataframe in the same location. The solution given in that post was to use a Panel, but that is now deprecated.
If I do this:
import pandas as pd
import numpy as np
from scipy.stats.mstats import hmean
df1 = pd.DataFrame(dict(x=np.random.randint(5, 10, 5), y=np.random.randint(1, 6, 5)))
df2 = pd.DataFrame(dict(x=np.random.randint(5, 10, 5), y=np.random.randint(1, 6, 5)))
dfs_dictionary = {'DF1':df1,'DF2':df2}
df=pd.concat(dfs_dictionary)
print(df)
x y
DF1 0 9 4
1 6 4
2 7 2
3 5 2
4 5 2
DF2 0 9 2
1 7 1
2 7 1
3 9 5
4 8 3
x = df.groupby(level = 1).apply(hmean, axis = None).reset_index()
print(x)
index 0
0 0 4.114286
1 1 2.564885
2 2 2.240000
3 3 3.956044
4 4 3.453237
I only get one column of values. Why? I was expecting two columns as per the original df, one for the hmean of the x values and one for the hmean of the y values. How can I achieve what I want to do?
The reason is that you pass
axis=None
tohmean
, which flattens the data. Remember when you dogroupby().apply()
, the argument is the whole group, e.g.df.loc['DF1']
. Just removeaxis=None
:And you get:
Or you can use
agg
:and get:
In the case you have more columns than just
x,y
: