Body:
Problem:
I have a wide DataFram that I'm struggling to reshape and aggregate using pandas.
Sample Data:
| Unnamed: 1 | Unnamed: 2 | Unnamed: 3 |
|---|---|---|
| Bezeichnung (Bezirksregion) | Straftaten \n-insgesamt- | Raub |
| Mitte | 81178 | 823 |
| Tiergarten Süd | 4113 | 37 |
| Regierungsviertel | 6251 | 41 |
| Alexanderplatz | 18999 | 163 |
import pandas as pd
from io import StringIO
# Sample data
data = """
,Unnamed: 1,Unnamed: 2,Unnamed: 3
3,Bezeichnung (Bezirksregion),Straftaten \\n-insgesamt-,Raub
4,Mitte,81178,823
5,Tiergarten Süd,4113,37
6,Regierungsviertel,6251,41
7,Alexanderplatz,18999,163
"""
# Load data into a DataFrame
df = pd.read_csv(StringIO(data), index_col=0)
# Display the DataFrame
print("Loaded DataFrame:")
print(df)
Desired Output:
desired_output = {
'Region': ['Mitte', 'Tiergarten Süd', 'Regierungsviertel', 'Alexanderplatz'],
'Crimes': ['Straftaten -insgesamt-', 'Straftaten -insgesamt-', 'Raub', 'Raub'],
'count': [81178, 4113, 41.33, 163],
}
| Region | Crimes | Count |
|---|---|---|
| Mitte | Straftaten -insgesamt- | 81178 |
| Tiergarten Süd | Straftaten -insgesamt- | 4113 |
| Regierungsviertel | Raub | 41.33 |
| Alexanderplatz | Raub | 163 |
Current Approach:
melted_df = pd.melt(df, id_vars=[df.columns[0]], var_name='crime', value_name='crime_count')
Issues:
Result not as expected:
| crime | crime_count | |
|---|---|---|
| Bezeichnung (Bezirksregion) | Unnamed: 2 | Straftaten \n-insgesamt- |
| Mitte | Unnamed: 2 | 81178 |
| Tiergarten Süd | Unnamed: 2 | 4113 |
| Regierungsviertel | Unnamed: 2 | 6251 |
| Alexanderplatz | Unnamed: 2 | 18999 |
Assistance Requested:
How can I reshape the DataFrame to achieve the desired output?
IIUC, you need to rename the columns using the first row of your data, then melt it on the first column and rename the resultant dataframe's columns:
Output for your sample data: