When pivoting the following dataframe from long to wide, I would like to get "groups" of columns and mark them with a prefix or suffix.
- The groups of elements can have different sizes, i.e. consist of one, two or more grouped elements/rows, I used pairs of two here to keep the example simple.
import pandas as pd
df = pd.DataFrame(
[
{'group': 'group-009297534', 'single_id': 'single-011900051', 'country': 'ESP', 'name': '00000911'},
{'group': 'group-009297534', 'single_id': 'single-000000821', 'country': 'USA', 'name': '00001054'},
{'group': 'group-009280053', 'single_id': 'single-000000002', 'country': 'HUN', 'name': '00000496'},
{'group': 'group-009280053', 'single_id': 'single-000000014', 'country': 'HUN', 'name': '00000795'},
{'group': 'group-009245039', 'single_id': 'single-000001258', 'country': 'NOR', 'name': '00000527'},
{'group': 'group-009245039', 'single_id': 'single-000000669', 'country': 'TWN', 'name': '00000535'}
]
)
My approach of assigning an index to the elements to be grouped and then using this for the column designation is already going in the right direction, but still deviates from the expected view
df['idx'] = df.groupby('group').cumcount()
df.pivot(index='group', columns='idx')
| group | ('single_id', 0) | ('single_id', 1) | ('country', 0) | ('country', 1) | ('name', 0) | ('name', 1) |
|---|---|---|---|---|---|---|
| group-009245039 | single-000001258 | single-000000669 | NOR | TWN | 00000527 | 00000535 |
| group-009280053 | single-000000002 | single-000000014 | HUN | HUN | 00000496 | 00000795 |
| group-009297534 | single-011900051 | single-000000821 | ESP | USA | 00000911 | 00001054 |
However, the expected solution would look like this:
| group | single_id_1 | country_1 | name_1 | single_id_2 | country_2 | name_2 | |
|---|---|---|---|---|---|---|---|
| 0 | group-009245039 | single-000001258 | NOR | 00000527 | single-000000669 | TWN | 00000535 |
| 1 | group-009280053 | single-000000002 | HUN | 00000496 | single-000000014 | HUN | 00000795 |
| 2 | group-009297534 | single-011900051 | ESP | 00000911 | single-000000821 | USA | 00001054 |
I'm not sure whether the approach with the multi-index, which would then have to be sorted and merged somehow, is the right one or whether there is a more elegant option.
Code
out