I have a dataframe with 4 columns as follows:
index 1 index 2 index 3 index 4
date
2023-07-14 0.0585 0.0775 -0.0289 0.0069
2023-07-17 0.0585 0.0750 -0.0300 0.0065
2023-07-18 0.0590 0.0729 -0.0311 0.0065
2023-07-19 0.0599 0.0711 -0.0309 0.0067
2023-07-20 0.0803 0.0689 -0.0309 0.0071
2023-07-21 0.0613 0.0677 0.0989 0.0083
I want to create a two new columns, 'rank 1', and 'rank 2' respectively to identify the indexes with the largest value and second largest value per each row, like below:
index 1 index 2 index 3 index 4 Rank 1 rank 2
date
2023-07-14 0.0585 0.0775 -0.0289 0.0069 index 2 index 1
2023-07-17 0.0585 0.0750 -0.0300 0.0065 index 2 index 1
2023-07-18 0.0590 0.0729 -0.0311 0.0065 index 2 index 1
2023-07-19 0.0599 0.0711 -0.0309 0.0067 index 2 index 1
2023-07-20 0.0803 0.0689 -0.0309 0.0071 index 1 index 2
2023-07-21 0.0613 0.0677 0.0989 0.0083 index 3 index 2
I learnt of the df.rank function but it appears that it only enables ranking values by columns, not by row.
Use numpy's
argpartition, which will be the most efficient approach:Output:
For a pure pandas approach (but much less efficient),
stack,sort_values, filter and reshape with apivot: