Pandas melt and ordered based on another columns

55 Views Asked by At

I have a data frame like

import pandas as pd
data = {'Type': ['Fruits', 'Fruits', 'Fruits', 'Fruits'],
        'Name': ['Mango', 'Mango', 'Mango', 'Mango'],
        'Variety': ['Alphonso', 'Dasheri', 'Langra', 'Raspuri'],
        'April':[120,110,90,60],
        'May':[110,80,50,40],
        'June':[80,110,76,65],
        'July':[85,87,55,50]}
df = pd.DataFrame(data)
df=df[['Type','Name','Variety','April','May','June','July']]
     Type   Name   Variety  April  May  June  July
0  Fruits  Mango  Alphonso    120  110    80    85
1  Fruits  Mango   Dasheri    110   80   110    87
2  Fruits  Mango    Langra     90   50    76    55
3  Fruits  Mango   Raspuri     60   40    65    50

When I am doing pandas melt over above dataframe, I am getting like

ndf=df.melt(id_vars=['Type','Name','Variety'],var_name="Month",value_name="Price")
     Type   Name   Variety  Month  Price
0   Fruits  Mango  Alphonso  April    120
1   Fruits  Mango   Dasheri  April    110
2   Fruits  Mango    Langra  April     90
3   Fruits  Mango   Raspuri  April     60
...........
11  Fruits  Mango   Raspuri   June     65
12  Fruits  Mango  Alphonso   July     85
13  Fruits  Mango   Dasheri   July     87
14  Fruits  Mango    Langra   July     55
15  Fruits  Mango   Raspuri   July     50

But actually I need the data frame ordered based on "variety" instead of "month". The expected dataframe is like

     Type   Name   Variety  Month  Price
0   Fruits  Mango  Alphonso  April    120
1   Fruits  Mango  Alphonso    May    110
2   Fruits  Mango  Alphonso   June     80
3   Fruits  Mango  Alphonso   July     85
4   Fruits  Mango   Dasheri  April    110
5   Fruits  Mango   Dasheri    May     80
.................................
13  Fruits  Mango   Raspuri    May     40
14  Fruits  Mango   Raspuri   June     65
15  Fruits  Mango   Raspuri   July     50

What is the solution for this?

4

There are 4 best solutions below

0
Jesse Sealand On BEST ANSWER

You just need to sort the values in the dataframe by that column such as

ndf = ndf.sort_values(by=['Variety'], ascending = False)
1
renzo21 On

After you use pd.melt you could sort the dataframe based on the Variety column

ndf.sort_values(by=['Variety']).reset_index(drop=True, inplace=True)
ndf

Here's the documentation for the sort_values method https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.sort_values.html

Result:

Type Name Variety Month Price
0 Fruits Mango Alphonso April 120
1 Fruits Mango Alphonso May 110
2 Fruits Mango Alphonso June 80
3 Fruits Mango Alphonso July 85
4 Fruits Mango Dasheri April 110
5 Fruits Mango Dasheri May 80
6 Fruits Mango Dasheri June 110
7 Fruits Mango Dasheri July 87
8 Fruits Mango Langra April 90
9 Fruits Mango Langra May 50
10 Fruits Mango Langra June 76
11 Fruits Mango Langra July 55
12 Fruits Mango Raspuri April 60
13 Fruits Mango Raspuri May 40
14 Fruits Mango Raspuri June 65
15 Fruits Mango Raspuri July 50

Edit 1: As per mozway's suggestion you don't need to reset the index after, but can instead pass ignore_index=True as a parameter to sort_values()

ndf.sort_values(by=['Variety'], ignore_index=True, inplace=True)
ndf
0
mozway On

Instead of sorting the columns after melt, you could also change the reshaping method.

If you stack this will directly preserve the original order of rows/columns:

out = (df.rename_axis(columns='Month')
         .set_index(['Type','Name','Variety'])
         .stack().reset_index(name='Price')
      )

Output:

      Type   Name   Variety  Month  Price
0   Fruits  Mango  Alphonso  April    120
1   Fruits  Mango  Alphonso    May    110
2   Fruits  Mango  Alphonso   June     80
3   Fruits  Mango  Alphonso   July     85
4   Fruits  Mango   Dasheri  April    110
5   Fruits  Mango   Dasheri    May     80
6   Fruits  Mango   Dasheri   June    110
7   Fruits  Mango   Dasheri   July     87
8   Fruits  Mango    Langra  April     90
9   Fruits  Mango    Langra    May     50
10  Fruits  Mango    Langra   June     76
11  Fruits  Mango    Langra   July     55
12  Fruits  Mango   Raspuri  April     60
13  Fruits  Mango   Raspuri    May     40
14  Fruits  Mango   Raspuri   June     65
15  Fruits  Mango   Raspuri   July     50
0
sammywemmy On

One option is with pivot_longer, where you pass True to the sort_by_appearance parameter, and ensure Variety is the first column :

# pip install pyjanitor
import pandas as pd

(df
.pivot_longer(
    index=['Variety','Type','Name'],
    names_to='Month',
    values_to='Price',
    sort_by_appearance=True)
)

     Variety    Type   Name  Month  Price
0   Alphonso  Fruits  Mango  April    120
1   Alphonso  Fruits  Mango    May    110
2   Alphonso  Fruits  Mango   June     80
3   Alphonso  Fruits  Mango   July     85
4    Dasheri  Fruits  Mango  April    110
5    Dasheri  Fruits  Mango    May     80
6    Dasheri  Fruits  Mango   June    110
7    Dasheri  Fruits  Mango   July     87
8     Langra  Fruits  Mango  April     90
9     Langra  Fruits  Mango    May     50
10    Langra  Fruits  Mango   June     76
11    Langra  Fruits  Mango   July     55
12   Raspuri  Fruits  Mango  April     60
13   Raspuri  Fruits  Mango    May     40
14   Raspuri  Fruits  Mango   June     65
15   Raspuri  Fruits  Mango   July     50