List of the (row, col) of the n largest values in a numeric pandas DataFrame?

Question

List of the (row, col) of the n largest values in a numeric pandas DataFrame?

625 Views Asked by user3673 At 16 December 2020 at 17:44

Given a Pandas DataFrame of numeric values how can one produce a list of the .loc cell locations that one can then use to then obtain the corresponding n largest values in the entire DataFame?

For example:

	A	B	C	D	E
X	1.3	3.6	33	61.38	0.3
Y	3.14	2.71	64	23.2	21
Z	1024	42	66	137	22.2
T	63.123	111	1.23	14.16	50.49

An n of 3 would produce the (row,col) pairs for the values 1024, 137 and 111.

These locations could then, as usual, be fed to .loc to extract those values from the DataFrame. i.e.

df.loc['Z','A']
df.loc['Z','D']
df.loc['T','B']

Note: It is easy to mistake this question for one that involves .idxmax. That isn't applicable due to the fact that there may be multiple values selected from a row and/or column in the n largest.

Original Q&A

There are 2 best solutions below

Lior Cohen On 16 December 2020 at 20:47

The trick is to use np.unravel_index on the np.argsort

Example:

import numpy as np
import pandas as pd

N = 5
df = pd.DataFrame([[11, 3, 50, -3],
                   [5, 73, 11, 100],
                   [75, 9, -2, 44]])


s_ix = np.argsort(df.values, axis=None)[::-1][:N]
labels = np.unravel_index(s_ix, df.shape)
labels = list(zip(*labels))

print(labels) # --> [(1, 3), (2, 0), (1, 1), (0, 2), (2, 3)]
print(df.loc[labels[0]])  # --> 100

**Nikolaos Chatzis** · Accepted Answer · 2020-12-16T21:03:15.680000

You could try:

>>> data = {0 : [1.3, 3.14, 1024, 63.123], 1: [3.6, 2.71, 42, 111], 2 : [33, 64, 66, 1.23], 3 : [61.38, 23.2, 137, 14.16], 4 : [0.3, 21, 22.2, 50.49] }
>>> df = pd.DataFrame(data)
>>> df
          0       1      2       3      4
0     1.300    3.60  33.00   61.38   0.30
1     3.140    2.71  64.00   23.20  21.00
2  1024.000   42.00  66.00  137.00  22.20
3    63.123  111.00   1.23   14.16  50.49
>>>
>>> a = list(zip(*df.stack().nlargest(3).index.labels))
>>> a
[(2, 0), (2, 3), (3, 1)]
>>> # then ...
>>> df.loc[a[0]]
1024.0
>>> 
>>> # all sorted in decreasing order ...
>>> list(zip(*df.stack().nlargest(20).index.labels))
[(2, 0), (2, 3), (3, 1), (2, 2), (1, 2), (3, 0), (0, 3), (3, 4), (2, 1), (0, 2), (1, 3), (2, 4), (1, 4), (3, 3), (0, 1), (1, 0), (1, 1), (0, 0), (3, 2), (0, 4)]

Edit: In pandas versions 0.24.0 and above, MultiIndex.labels has been replaced by MultiIndex.codes(see Deprecations in What’s new in 0.24.0 (January 25, 2019)). The above code will throw AttributeError: 'MultiIndex' object has no attribute 'labels' and needs to be updated as follows:

>>> a = list(zip(*df.stack().nlargest(3).index.codes))
>>> a
[(2, 0), (2, 3), (3, 1)]

Edit 2: This question has become a "moving target", as the OP keeps changing it (this is my last update/edit). In the last update, OP's dataframe looks as follows:

>>> data = {'A' : [1.3, 3.14, 1024, 63.123], 'B' : [3.6, 2.71, 42, 111], 'C' : [33, 64, 66, 1.23], 'D' : [61.38, 23.2, 137, 14.16], 'E' : [0.3, 21, 22.2, 50.49] }
>>> df = pd.DataFrame(data, index=['X', 'Y', 'Z', 'T'])
>>> df
          A       B      C       D      E
X     1.300    3.60  33.00   61.38   0.30
Y     3.140    2.71  64.00   23.20  21.00
Z  1024.000   42.00  66.00  137.00  22.20
T    63.123  111.00   1.23   14.16  50.49

The desired output can be obtained using:

>>> a = df.stack().nlargest(3).index 
>>> a
MultiIndex([('Z', 'A'),
            ('Z', 'D'),
            ('T', 'B')],
           )
>>>
>>> df.loc[a[0]]
1024.0

List of the (row, col) of the n largest values in a numeric pandas DataFrame?

There are 2 best solutions below

Related Questions in PANDAS

Related Questions in INDEXING

Related Questions in ARGMAX

Trending Questions

Popular # Hahtags

Popular Questions