Get column number for Python Melt function

Question

Get column number for Python Melt function

75 Views Asked by PythonDeveloper At 25 January 2024 at 22:11

I need to unpivot a pandas dataframe. I am using pd.melt() function for this. It is working as expected, now I need to add an additional column "column_number" in my output. Example below:

name age gender  id
a   18  m       1   
b   20  f       2

Current Output:

   id   variable   value
    1    name        a
    1    age         18
    1    gender      m
    2    name        b
    2    age         20
    2    gender      f

Expected Output:

id  column_number  variable   value
1    1             name        a
1    2             age         18
1    3             gender      m
2    1             name        b
2    2             age         20
2    3             gender      f

Since my dataframe structure can change, I will not know if I have 3 columns or more in future. How can I generate this column_number column in melt results?

Original Q&A

There are 5 best solutions below

**Andrej Kesely** · Answer 1 · 2024-01-25T22:19:19.920000

One possible solution is to use .groupby with .cumcount():

out = df.set_index("id").stack().to_frame(name="value")
out["column_number"] = out.groupby(level=0).cumcount() + 1

print(out.reset_index().rename(columns={"level_1": "variable"}))

Prints:

   id variable value  column_number
0   1     name     a              1
1   1      age    18              2
2   1   gender     m              3
3   2     name     b              1
4   2      age    20              2
5   2   gender     f              3

Or if you have already melted df:

df["column_number"] = df.groupby("id").cumcount() + 1
print(df)

If order matters:

df.insert(1, 'column_number', df.groupby("id").cumcount() + 1)
print(df)

**Corralien** · Answer 2 · 2024-01-25T22:29:26.070000

You can use melt as you already did it and chain with assign

out = (df.melt(id_vars='id', ignore_index=False).sort_index()
         .assign(column_number=lambda x: x.groupby('id').cumcount()+1))
print(out)

# Output
   id variable value  column_number
0   1     name     a              1
0   1      age    18              2
0   1   gender     m              3
1   2     name     b              1
1   2      age    20              2
1   2   gender     f              3

**Onyambu** · Answer 3 · 2024-01-25T23:16:17.407000

Use row_number within mutate after you have melted the data to long using pivot_longer from siuba:

from siuba import _, mutate, ungroup, group_by
from siuba.dply.vector import row_number
from siuba.experimental.pivot import pivot_longer

(pivot_longer(df, ~_.id, names_to = 'variable') >>
   group_by(_.id) >>
   mutate(column_number = row_number(_.id))>>
   ungroup())

   id variable value  column_number
0   1     name     a              1
0   1      age    18              2
0   1   gender     m              3
1   2     name     b              1
1   2      age    20              2
1   2   gender     f              3

**Nick** · Answer 4 · 2024-01-25T23:17:22.227000

One way to achieve this is to create a column multi-index using the column number and then melting that result:

out = df.set_index('id')
out.columns = pd.MultiIndex.from_tuples(enumerate(out, 1), names=['column_number', 'variable'])
out = out.melt(ignore_index=False).sort_index().reset_index()

Output:

   id  column_number variable value
0   1              1     name     a
1   1              2      age    18
2   1              3   gender     m
3   2              1     name     b
4   2              2      age    20
5   2              3   gender     f

**mozway** · Answer 5 · 2024-01-26T05:47:30.607000

Since melt preserves the original order of columns, you don't need a groupby.cumcount, a simple factorize is sufficient (and more efficient):

out = (df.melt('id')
         .assign(column_number=lambda d: pd.factorize(d['variable'])[0]+1)
         .sort_values(by='id', ignore_index=True)
      )

If what you want is the original position of the columns (this also considering the non-melted columns), then a simple map is enough:

cols = {k: i for i,k in enumerate(df, start=1)}
# {'name': 1, 'age': 2, 'gender': 3, 'id': 4}

out = (df.melt('id')
         .assign(column_number=lambda d: d['variable'].map(cols))
         .sort_values(by='id', ignore_index=True)
      )

Output:

   id variable value  column_number
0   1     name     a              1
1   1      age    18              2
2   1   gender     m              3
3   2     name     b              1
4   2      age    20              2
5   2   gender     f              3

Get column number for Python Melt function

There are 5 best solutions below

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in PANDAS-MELT

Trending Questions

Popular # Hahtags

Popular Questions