Hello, I am trying to finish a case study for the Google Data Analytics cert and I saw in their example (using R) they were able to group users into categories based on their Activity Levels (4 different columns). in the notebook, they used R to seperate the groups as such:
"Sedentary", for users who's sedentary totals were higher than the mean for the Sendentary column AND who's totals for all other levels were lower than the means for those columns.
"Lightly Active", users who had less than the mean in the Sedentary, FairlyActive, and VeryActive columns, HighlyActive, but higher than the mean in the LightlyActive.
"Fairly Active", users who had less than the mean in Sedentary, LightlyActive, and VeryActive columns but higher than the mean for the FairlyActive column.
And lastly "Very Active", users who had less than the mean in Sedentary, LightlyActive, and FairlyActive columns, but higher than average in the VeryActive column.
I am trying to do my case study with Python to help myself better understand it, and I was wondering if anybody can help me similarly group these using pandas. I'm still fairly new to this and I know there's a groupby() function and ways to aggregate them. But I'm not sure how exactly I could make the categories in the same way so I can visualize user usage. I would greatly appreciate any tips at all on how to get this done. Thanks!
I am still very new so I haven't tried anything. But I assumed I would need to define function that can group and label each category. This is not something I've attempted on my own yet.
I received an answer from a very helpful user on kaggle from user www.kaggle.com/aarishasifkhan that was exactly what I was looking for, sharing in case others face this problem.
You can achieve similar grouping and summarization in Python using the pandas library. Here's how you can do it: Import pandas
Assuming you have loaded your data into a DataFrame called daily_data Calculate the mean of each activity type
Define a function to categorize users based on their activity levels
def categorize_user(row): if (row['SedentaryMinutes'] > mean_sedentary and row['LightlyActiveMinutes'] < mean_lightly_active and row['FairlyActiveMinutes'] < mean_fairly_active and row['VeryActiveMinutes'] < mean_very_active): return "Sedentary" elif (row['SedentaryMinutes'] < mean_sedentary and row['LightlyActiveMinutes'] > mean_lightly_active and row['FairlyActiveMinutes'] < mean_fairly_active and row['VeryActiveMinutes'] < mean_very_active): return "Lightly Active" elif (row['SedentaryMinutes'] < mean_sedentary and row['LightlyActiveMinutes'] < mean_lightly_active and row['FairlyActiveMinutes'] > mean_fairly_active and row['VeryActiveMinutes'] < mean_very_active): return "Fairly Active" elif (row['SedentaryMinutes'] < mean_sedentary and row['LightlyActiveMinutes'] < mean_lightly_active and row['FairlyActiveMinutes'] < mean_fairly_active and row['VeryActiveMinutes'] > mean_very_active): return "Very Active" else: return "Unknown"Apply the categorization function to each row and create a new column for user_typeDrop rows with NaN values if any
Now you have a new column 'user_type' with categorized users, you can do further analysis or visualization with it.