I have this function, that takes dataframe with the data about articles of life expectancy in different regions and countries. I want to count the proportion of articles of each region in comparison to all articles,and also to count proportions of articles about male and female among each region. My question is how can I replace "for loop" in order to make small dataframe through the function calc_proportion? This function takes all the unique regions in Dataframe and counts proportions for each of them.
I want to have this kind of dataframe from function calc_proportion.
def calc_proportion(df):
proportions = pd.DataFrame(columns=['Region', 'Proportion_of_all_articles', 'Proportion_male_articles', 'Proportion_female_articles', 'Proportion_bs_articles'])
Regions = df.Region.unique()
for region in Regions:
a = f"{df.loc[df['Region'] == region].shape[0] / df.shape[0] : .0%}"
b = f"{df.loc[(df['Region'] == region) & (df['Sex'] == 'Male')].shape[0] / df.loc[df['Region'] == region].shape[0] : .0%}"
c = f"{df.loc[(df['Region'] == region) & (df['Sex'] == 'Female')].shape[0] / df.loc[df['Region'] == region].shape[0] : .0%}"
d = f"{df.loc[(df['Region'] == region) & (df['Sex'] == 'Both sexes')].shape[0] / df.loc[df['Region'] == region].shape[0] : .0%}"
proportions.loc[len(proportions)] = [region, a, b, c, d]
return proportions
calc_proportion(df)
So I want to get small dataframe of proportions in 'out' without using for loop in function.
Initial data:


Minimal reproducible example
Here's one approach:
df.groupbyon "Region" and applygroupby.value_countswithnormalizeparameter set toTrueto get distribution per region.df.unstackto pivot the second index level (with the "sexes").df["Region"](Series.value_counts). We usedf.jointo join the two results.df.fillnato fillNaNvalues with0.df.renameto change the column names.df.loc, and reset the index withdf.reset_index.Code
Result
Formatted result
Seeing that you are working in Jupyter Notebook, I'd suggest using
df.style.formatto print the result with the floats as percentages: