Pandas: Aggregate and/or apply does not work with user defined function

63 Views Asked by At

I am trying to get a weighted average from a dt, but neither apply nor agg seems to work, and my code returns the following error 'numpy.float64' object is not callable

I have the following df

df = pd.DataFrame([['RETIRO', 65, 1, 10.7],

                   ['SAN NICOLAS',116, 1, 23.2],

                   ['RETIRO', 101, 2, 28.7],

                   ['FLORES', 136 , 2, 23.5]],

                  columns=['BARRIO', 'HOGARES', 'COMUNA', 'NSE'])

I define the function

def avg_w(dt):
    return np.average(a = dt.NSE, weights = dt.HOGARES)

and now apply it to my df,

df.loc[:,['COMUNA','NSE','HOGARES']].groupby(['COMUNA']).apply(avg_w(df))

and it returns 'numpy.float64' object is not callable

I tried also something similar to the suggestions found in here and here

I changed the function,

def avg_w2(dt):
    return pd.Series({'avg_w2': np.average(a = dt.NSE, weights = dt.HOGARES)})

and the apply

df.loc[:,['COMUNA','NSE','HOGARES']].groupby(['COMUNA']).apply({'avgw': [avg_w2(dt)]})

But it didn't work either. The code returns TypeError: unhashable type: 'dict'

The function works alone but something is not working when I passed it to apply (or aggregate, I tried with both of them)

I am expecting to obtain for each COMUNA the NSE average weighted by HOGARES.

2

There are 2 best solutions below

1
hashir_k On BEST ANSWER

Seems like what you want is the following:

df = df.iloc[:, 1:].groupby(by="COMUNA").apply(
        lambda grp : np.average(a=grp['NSE'], weights=grp["HOGARES"])
    )

Which results in the following dataframe:

COMUNA
1    18.711050
2    25.716034

Note: you may use a function instead of the lambda expression to apply it to each group, but you need to pass the function name itself i.e df.apply(avg_w2) NOT df.apply(avg_w2(df))

1
tim654321 On

You are calling your function when you pass it to apply, you should just pass the function as an object:

df.loc[:,['COMUNA','NSE','HOGARES']].groupby(['COMUNA']).apply(avg_w)

This might not solve all your problems, but should solve your first.