Pandas - recursively melting many columns while making column names indices

36 Views Asked by At

I have a very wide dataframe where the first column is an index and all subsequent column names I want to become indices too in the melted output. The dataframes have lots of columns so I need a recursive way to do the below on N columns. (As a newbie, this is over my head).

My dataframe is a beginner version of this. I am assuming it is Melt with some complex scope logic, but I am drowning in my confusion right now.

import pandas as pd

df = pd.DataFrame({"GIS": [1, 2, 3], "X0": [100, 200, 300], "X1": [50, 51, 52], "X2": [71, 72, 73], "Xn": [100, 150, 210]})
df = df.set_index(["GIS"])

It produces a df that looks like this: enter image description here

I don't know how to get there programmatically, especially on really wide dataframes, but ultimately I want it to produce a df that looks like this:

new_df = pd.DataFrame({"GIS": [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3], "Variable": ["X0", "X1", "X2", "Xn", "X0", "X1", "X2", "Xn", "X0", "X1", "X2", "Xn"], "Value": [100, 50, 71, 100, 200, 51, 72, 150, 300, 52, 73, 200]})
new_df = new_df.set_index(["GIS", "Variable"])

Which should produce a df structured like this. enter image description here

Your suggestions for a bigger are greatly appreciated.

1

There are 1 best solutions below

0
jezrael On BEST ANSWER

Use DataFrame.rename_axis, DataFrame.stack and Series.to_frame:

new_df = df.rename_axis('Variable', axis=1).stack().to_frame('Value')
print (new_df)
              Value
GIS Variable       
1   X0          100
    X1           50
    X2           71
    Xn          100
2   X0          200
    X1           51
    X2           72
    Xn          150
3   X0          300
    X1           52
    X2           73
    Xn          210

Or DataFrame.melt with DataFrame.sort_index and DataFrame.set_index with append=True for MultiIndex:

new_df = (df.melt(ignore_index=False, var_name='Variable', value_name='Value')
            .sort_index()
            .set_index('Variable', append=True))
print (new_df)
              Value
GIS Variable       
1   X0          100
    X1           50
    X2           71
    Xn          100
2   X0          200
    X1           51
    X2           72
    Xn          150
3   X0          300
    X1           52
    X2           73
    Xn          210