Splitting number of clients into 3 equal intervals

88 Views Asked by At

I have a question regarding the splitting by a particular dimension the number of clients into 3 equal intervals. I would like to split the number of values in the 'N' column (representing the number of clients, ncl) into three equal parts, grouped by the 'Ea EUR' column. Here is the part of the Python code concerning that matter:

def f(df, stab, idn, ndef, expo, pdm, segm):       
    df = df.loc[df[stab] == segm]
    df = df.sort_values(by=[expo])
    quartiles = df[expo].quantile([1/3, 2/3, 1]).tolist()
    # add a lower and upper range for the bins in pd.cut
    quartiles = [-0.001] + quartiles
    df['By ex'] = pd.cut(df[expo], bins=quartiles, labels = ['Lower', 'Medium', 'Upper'])
    dfA = df.groupby(['By ex']).agg(apd = (pdm, 'mean'), 
                                    ncl=(idn, 'count'), 
                                    ndf=(ndef, 'sum'),
                                    ex=(expo, 'sum'))
    dfA.loc[:, 'dfr'] = dfA['ndf'] / dfA['ncl']   
    dfA.loc[:, 'p_value'] = stats.beta.cdf(dfA['apd'], dfA['ndf'] + 1/2,
                                                    dfA['ncl'] - dfA['ndf'] + 1/2)
    dfA = dfA.reset_index()
    dfA=dfA.rename({'ncl':'N', 
                    'ndf':'D',
                    'ex':'Ea EUR',
                    'dfr':'DR', 
                    'p_value':'test p-value'}, axis=1)
    df_temp = pd.DataFrame({"Ex tertile" : dfA.loc[:, "By ex"].values,
                            "Ea EUR" : dfA.loc[:, "Ea EUR"].values,
                            "AD" : dfA.loc[:, "apd"].values,
                            "N" : dfA.loc[:, 'N'].values,
                            "D" : dfA.loc[:, "D"].values,
                            "DR" : dfA.loc[:, "DR"].values,
                            "test p-value" : dfA.loc[:, "test p-value"].values})
    return pd.DataFrame(np.vstack([df_temp.columns, df_temp]))

However, the above code generates an unequal distribution of clients in the groups, and in spite of my efforts to fine tune it in various ways, I am not able to correct it successfully. Please suggest some possible improvement of the code.

0

There are 0 best solutions below