I have a large dataset that is difficult to investigate without analysis tools. It's general form is this, but with 16 "ItemType0" columns and 16 "ItemType1", "ItemType2", etc columns.
It represents the properties (many of them) of up to 16 different items recorded at a single timestep, then properties of that timestep.
| Time | ItemType0[0].property | ItemType0[1].property | Property |
|---|---|---|---|
| 1 | 1 | 0 | 2 |
| 2 | 0 | 1 | 2 |
| 3 | 3 | 3 | 2 |
I'd like to receive:
| Time | ItemType0.property | Property |
|---|---|---|
| 1 | 1 | 2 |
| 2 | 0 | 2 |
| 3 | 3 | 2 |
| 1 | 0 | 2 |
| 2 | 1 | 2 |
| 3 | 3 | 2 |
import pandas as pd
wide_df = pd.DataFrame({
"Time": [1,2,3],
"ItemType0[0].property": [1,0,3],
"ItemType0[1].property": [0,1,3],
"Property": [2,2,2]})
What I've tried:
Melt:
ids = [col for col in wide_df.columns if "[" not in col] inter_df = pd.melt(wide_df, id_vars=ids, var_name="Source")MemoryError: Unable to allocate 28.3 GiB for an array with shape (15,506831712) and data type uint32
I wouldn't even know where to begin with
pd.wide_to_longas everything doesn't start with the same.
IIUC, you can try to group the properties by
ItemTypeXand then explode:Prints: