Python Loop to generate multiple pivot tables from one pandas dataframe

83 Views Asked by At

I've selected a set of records from a larger data set that represents a specific organization. I need to iterate over this data and produce a pivot table that compares the demographic information to how that demographic answered the survey questions. There are 5 demographic questions, each with multiple choices for categorization. A pivot table provides a count of how each demographic category answered a survey question. I need to produce 5 pivot tables for each of the remaining 17 questions and store them to a dataframe. I'll stack the resulting dataframes later. The problem I'm having is trying to iterate over the main original dataframe, replacing the two column names for the crosstab. I can easily do one manually with the following code:

table18 = pd.pivot_table(org1_df, values='Business size?', index=['Business type?'], 
                              columns=['Survey Question 1'], 
                              aggfunc=['count'], fill_value=0)

Or

cross = pd.crosstab(org1_df["Business type?"], org1_df["Survey Question 2"])
cross

Here's the pivot table I was able to create:

The point is to determine how a small business answered the survey questions compared with a large business' answers. And so on for the other dozen+ categories we have on this survey.`

Respondents Business Type? Business Size? Survey Question 1 Survey Question 2
Resp_1 2 2 3 2
Resp_2 2 1 4 3

Sample data above.

I stored the demographic column names in a list:

column_names = ["Business type?", "Business size?"]

then added those to a dictionary:

d = {}
for name in column_names:
    d[name] = pd.DataFrame()

I tried a dictionary comprehension based on a similar question and answer:

dfs = {i[1]: pd.pivot_table(org1_df, index=i, values='Business type?', 
                            aggfunc='count', fill_value=0) for i, org1_df in zip(column_names, d)}

I received this error: AttributeError: 'str' object has no attribute 'columns'

I would greatly appreciate help on this task. As an aside, I'm not sure why when I leave values='Business type?' it doesn't matter for the resulting pivot table. The answers I'm getting matched what I get in Excel.

I'm working my way to a diverging stacked bar chart using matplotlib, but I have to get the data organized first.

UPDATE: org1_df output Based on the mock data above, I would produce 4 pivot tables from that.

  1. Survey Question 1 - a count of the answers for "Business type?"
  2. Survey Question 1 - a count of the answers for "Business size?"
  3. Survey Question 2 - a count of the answers for "Business type?"
  4. Survey Question 4 - a count of the answers for "Business size?"
0

There are 0 best solutions below