I need help using pandas to group data from multiple columns into labeled categories

Question

I need help using pandas to group data from multiple columns into labeled categories

24 Views Asked by Teresa At 27 March 2024 at 16:04

Hello, I am trying to finish a case study for the Google Data Analytics cert and I saw in their example (using R) they were able to group users into categories based on their Activity Levels (4 different columns). in the notebook, they used R to seperate the groups as such:

"Sedentary", for users who's sedentary totals were higher than the mean for the Sendentary column AND who's totals for all other levels were lower than the means for those columns.
"Lightly Active", users who had less than the mean in the Sedentary, FairlyActive, and VeryActive columns, HighlyActive, but higher than the mean in the LightlyActive.
"Fairly Active", users who had less than the mean in Sedentary, LightlyActive, and VeryActive columns but higher than the mean for the FairlyActive column.
And lastly "Very Active", users who had less than the mean in Sedentary, LightlyActive, and FairlyActive columns, but higher than average in the VeryActive column.

I am trying to do my case study with Python to help myself better understand it, and I was wondering if anybody can help me similarly group these using pandas. I'm still fairly new to this and I know there's a groupby() function and ways to aggregate them. But I'm not sure how exactly I could make the categories in the same way so I can visualize user usage. I would greatly appreciate any tips at all on how to get this done. Thanks!

I am still very new so I haven't tried anything. But I assumed I would need to define function that can group and label each category. This is not something I've attempted on my own yet.

Original Q&A

There are 1 best solutions below

**Teresa** · Answer 1 · 2024-03-28T18:08:35.047000

I received an answer from a very helpful user on kaggle from user www.kaggle.com/aarishasifkhan that was exactly what I was looking for, sharing in case others face this problem.

You can achieve similar grouping and summarization in Python using the pandas library. Here's how you can do it: Import pandas

import pandas as pd

Assuming you have loaded your data into a DataFrame called daily_data Calculate the mean of each activity type

mean_sedentary = daily_data['SedentaryMinutes'].mean()
mean_lightly_active = daily_data['LightlyActiveMinutes'].mean()
mean_fairly_active = daily_data['FairlyActiveMinutes'].mean()
mean_very_active = daily_data['VeryActiveMinutes'].mean()

Define a function to categorize users based on their activity levels

def categorize_user(row): if (row['SedentaryMinutes'] > mean_sedentary and row['LightlyActiveMinutes'] < mean_lightly_active and row['FairlyActiveMinutes'] < mean_fairly_active and row['VeryActiveMinutes'] < mean_very_active): return "Sedentary" elif (row['SedentaryMinutes'] < mean_sedentary and row['LightlyActiveMinutes'] > mean_lightly_active and row['FairlyActiveMinutes'] < mean_fairly_active and row['VeryActiveMinutes'] < mean_very_active): return "Lightly Active" elif (row['SedentaryMinutes'] < mean_sedentary and row['LightlyActiveMinutes'] < mean_lightly_active and row['FairlyActiveMinutes'] > mean_fairly_active and row['VeryActiveMinutes'] < mean_very_active): return "Fairly Active" elif (row['SedentaryMinutes'] < mean_sedentary and row['LightlyActiveMinutes'] < mean_lightly_active and row['FairlyActiveMinutes'] < mean_fairly_active and row['VeryActiveMinutes'] > mean_very_active): return "Very Active" else: return "Unknown" Apply the categorization function to each row and create a new column for user_type

daily_data['user_type'] = daily_data.apply(categorize_user, axis=1)

Drop rows with NaN values if any

daily_data.dropna(inplace=True)

Now you have a new column 'user_type' with categorized users, you can do further analysis or visualization with it.

I need help using pandas to group data from multiple columns into labeled categories

There are 1 best solutions below

Related Questions in PANDAS

Related Questions in GROUPING

Related Questions in CATEGORIES

Related Questions in DATA-CLEANING

Related Questions in EXPLORATORY-DATA-ANALYSIS

Trending Questions

Popular # Hahtags

Popular Questions