How to rank the values across multiple columns per row

Question

How to rank the values across multiple columns per row

64 Views Asked by manwong0606 At 12 January 2024 at 08:26

I have a dataframe with 4 columns as follows:

               index 1   index 2    index 3      index 4
date                                                   
2023-07-14     0.0585     0.0775    -0.0289      0.0069
2023-07-17     0.0585     0.0750    -0.0300      0.0065
2023-07-18     0.0590     0.0729    -0.0311      0.0065
2023-07-19     0.0599     0.0711    -0.0309      0.0067
2023-07-20     0.0803     0.0689    -0.0309      0.0071
2023-07-21     0.0613     0.0677    0.0989       0.0083

I want to create a two new columns, 'rank 1', and 'rank 2' respectively to identify the indexes with the largest value and second largest value per each row, like below:

               index 1   index 2    index 3      index 4   Rank 1    rank 2 
date                                                   
2023-07-14     0.0585     0.0775    -0.0289      0.0069    index 2    index 1
2023-07-17     0.0585     0.0750    -0.0300      0.0065    index 2    index 1
2023-07-18     0.0590     0.0729    -0.0311      0.0065    index 2    index 1
2023-07-19     0.0599     0.0711    -0.0309      0.0067    index 2    index 1
2023-07-20     0.0803     0.0689    -0.0309      0.0071    index 1    index 2
2023-07-21     0.0613     0.0677     0.0989      0.0083    index 3    index 2

I learnt of the df.rank function but it appears that it only enables ranking values by columns, not by row.

Original Q&A

There are 2 best solutions below

**mozway** · Answer 1 · 2024-01-12T08:34:57.210000

Use numpy's argpartition, which will be the most efficient approach:

import numpy as np

N = 2
cols = df.columns.to_numpy()
df[[f'Rank {x+1}' for x in range(N)]] = cols[np.argpartition(df.to_numpy(),
                                                             -N)[:, :-N-1:-1]]

Output:

            index 1  index 2  index 3  index 4   Rank 1   Rank 2
date                                                            
2023-07-14   0.0585   0.0775  -0.0289   0.0069  index 2  index 1
2023-07-17   0.0585   0.0750  -0.0300   0.0065  index 2  index 1
2023-07-18   0.0590   0.0729  -0.0311   0.0065  index 2  index 1
2023-07-19   0.0599   0.0711  -0.0309   0.0067  index 2  index 1
2023-07-20   0.0803   0.0689  -0.0309   0.0071  index 1  index 2
2023-07-21   0.0613   0.0677   0.0989   0.0083  index 3  index 2

For a pure pandas approach (but much less efficient), stack, sort_values, filter and reshape with a pivot:

N = 2

out = df.join(df.stack().sort_values(ascending=False)
   .reset_index(-1)[['level_1']]
   .groupby(level=0).head(N)
   .assign(col=lambda d: 'Rank '+d.groupby(level=0).cumcount().add(1).astype(str))
   .pivot(columns='col', values='level_1')
)

**manwong0606** · Answer 2 · 2024-01-15T08:20:29.507000

This can be done (upon further investigation) via a combination of apply and lambda functions:

    df['rank1']=df.T.apply(lambda x: x.nlargest(1).idxmin())
    df['rank2']=df.T.apply(lambda x: x.nlargest(2).idxmin())

How to rank the values across multiple columns per row

There are 2 best solutions below

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in RANK

Trending Questions

Popular # Hahtags

Popular Questions