Get column number for Python Melt function

75 Views Asked by At

I need to unpivot a pandas dataframe. I am using pd.melt() function for this. It is working as expected, now I need to add an additional column "column_number" in my output. Example below:

name age gender  id
a   18  m       1   
b   20  f       2 

Current Output:

   id   variable   value
    1    name        a
    1    age         18
    1    gender      m
    2    name        b
    2    age         20
    2    gender      f

Expected Output:

id  column_number  variable   value
1    1             name        a
1    2             age         18
1    3             gender      m
2    1             name        b
2    2             age         20
2    3             gender      f

Since my dataframe structure can change, I will not know if I have 3 columns or more in future. How can I generate this column_number column in melt results?

5

There are 5 best solutions below

0
Andrej Kesely On

One possible solution is to use .groupby with .cumcount():

out = df.set_index("id").stack().to_frame(name="value")
out["column_number"] = out.groupby(level=0).cumcount() + 1

print(out.reset_index().rename(columns={"level_1": "variable"}))

Prints:

   id variable value  column_number
0   1     name     a              1
1   1      age    18              2
2   1   gender     m              3
3   2     name     b              1
4   2      age    20              2
5   2   gender     f              3

Or if you have already melted df:

df["column_number"] = df.groupby("id").cumcount() + 1
print(df)

If order matters:

df.insert(1, 'column_number', df.groupby("id").cumcount() + 1)
print(df)
0
Corralien On

You can use melt as you already did it and chain with assign

out = (df.melt(id_vars='id', ignore_index=False).sort_index()
         .assign(column_number=lambda x: x.groupby('id').cumcount()+1))
print(out)

# Output
   id variable value  column_number
0   1     name     a              1
0   1      age    18              2
0   1   gender     m              3
1   2     name     b              1
1   2      age    20              2
1   2   gender     f              3
0
Onyambu On

Use row_number within mutate after you have melted the data to long using pivot_longer from siuba:

from siuba import _, mutate, ungroup, group_by
from siuba.dply.vector import row_number
from siuba.experimental.pivot import pivot_longer

(pivot_longer(df, ~_.id, names_to = 'variable') >>
   group_by(_.id) >>
   mutate(column_number = row_number(_.id))>>
   ungroup())

   id variable value  column_number
0   1     name     a              1
0   1      age    18              2
0   1   gender     m              3
1   2     name     b              1
1   2      age    20              2
1   2   gender     f              3
2
Nick On

One way to achieve this is to create a column multi-index using the column number and then melting that result:

out = df.set_index('id')
out.columns = pd.MultiIndex.from_tuples(enumerate(out, 1), names=['column_number', 'variable'])
out = out.melt(ignore_index=False).sort_index().reset_index()

Output:

   id  column_number variable value
0   1              1     name     a
1   1              2      age    18
2   1              3   gender     m
3   2              1     name     b
4   2              2      age    20
5   2              3   gender     f
0
mozway On

Since melt preserves the original order of columns, you don't need a groupby.cumcount, a simple factorize is sufficient (and more efficient):

out = (df.melt('id')
         .assign(column_number=lambda d: pd.factorize(d['variable'])[0]+1)
         .sort_values(by='id', ignore_index=True)
      )

If what you want is the original position of the columns (this also considering the non-melted columns), then a simple map is enough:

cols = {k: i for i,k in enumerate(df, start=1)}
# {'name': 1, 'age': 2, 'gender': 3, 'id': 4}

out = (df.melt('id')
         .assign(column_number=lambda d: d['variable'].map(cols))
         .sort_values(by='id', ignore_index=True)
      )

Output:

   id variable value  column_number
0   1     name     a              1
1   1      age    18              2
2   1   gender     m              3
3   2     name     b              1
4   2      age    20              2
5   2   gender     f              3