Column with ranking based on other columns but like tie breaker

62 Views Asked by At

I've got a dataframe:

df = pd.DataFrame({
    "DScore": [2, 2, 3, 4, 5],
    "EScore": [6, 7, 9, 9, 10],
    "Total Score": [17, 15, 15, 23, 25]
})

I want to write the code that will create a ranking column containing the classification of rows in the table based on the 'Total Score' column. If these values ​​are equal - you should pay attention to the values ​​​​of EScore points, if they are equal, then it will be based on the values ​​​​from the DScore column, and if these are also equal - we will assign them the same value. Expected result:

df = pd.DataFrame({
    "DScore": [2, 2, 4, 4, 5],
    "EScore": [6, 7, 9, 9, 10],
    "Total Score": [17, 15, 23, 23, 25],
    "Rank": [3,4,2,2,1]
})
3

There are 3 best solutions below

0
Panda Kim On BEST ANSWER

For example, you can multiply the EScore by 0.01 and the DScore by 0.001 to weight them more lightly. Then, you can add these values to Total Score and calculate the rank.

rank with method=dense & ascending=False

df['Rank'] = df['Total Score'].add(df['EScore'].mul(0.01)).add(df['DScore'].mul(0.0001)).rank(ascending=False, method='dense').astype('int')

df

   DScore  EScore  Total Score  Rank
0       2       6           17     3
1       2       7           15     4
2       4       9           23     2
3       4       9           23     2
4       5      10           25     1

The example you provided is sufficient with 0.01 and 0.0001, but these numbers should be adjusted to fit the dataset.

0
mozway On

Define the order of columns, then sort_values and groupby.ngroup:

# desired order of tie-breakers
cols = ['Total Score', 'EScore', 'DScore']

df['Rank'] = (df.sort_values(by=cols, ascending=False)
                .groupby(cols, sort=False).ngroup().add(1)
             )

Output:

   DScore  EScore  Total Score  Rank
0       2       6           17     3
1       2       7           15     4
2       4       9           23     2
3       4       9           23     2
4       5      10           25     1
0
Laurent B. On

You could also use the pandas rank function

df['Rank']= df['Total Score'].rank(ascending=False).astype(int)
   DScore  EScore  Total Score  Rank
0       2       6           17     4
1       2       7           15     5
2       4       9           23     2
3       4       9           23     2
4       5      10           25     1

Note you can modify the ranking method (5 values) :

method{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’

And with parameter set to dense it "compresses the counting" :

df['Rank']= df['Total Score'].rank(ascending=False, method='dense').astype(int)
   DScore  EScore  Total Score  Rank
0       2       6           17     3
1       2       7           15     4
2       4       9           23     2
3       4       9           23     2
4       5      10           25     1