Best way to combine rows with percentages?

30 Views Asked by At

I'm fairly new to dealing with data analysis/using data frames etc and looking for some advice.

If I had a data frame where each row represents an area, a column for the amount of people in that area then several columns of percentages (eg. % of people who own a car etc)

For example:

Area Population % car owners % employed
A 320 62.5 78.13
B 215 69.77 83.72
C 418 74.16 90.91

Say the areas B & C combine into a single area D - what would be a good way to go about it?

I could rename B & C to D, use groupby and find the mean of the percentages (and sum of population) but the average of percentages isn't always accurate.

Is there a way to calculate the new percentage?

1

There are 1 best solutions below

0
mozway On

One option is to first multiply the percentages by the population, then groupby.sum, finally divide again:

mapper = {'B': 'D', 'C': 'D'}

pct_cols = ['% car owners', '% employed']
# or
# pct_cols = list(df.filter(like='%'))

out = (df[pct_cols]
 .mul(df['Population'], axis=0)
 .combine_first(df).drop(columns='Area')
 .groupby(df['Area'].replace(mapper))
 .sum()
 .reset_index()
)

out[pct_cols] = (out[pct_cols]
                 .div(out['Population'], axis=0)
                 .round(2) # optional
                )

print(out)

Output:

  Area  % car owners  % employed  Population
0    A         62.50       78.13         320
1    D         72.67       88.47         633