Setting non-constant value on a subset of rows and columns in a dataframe

40 Views Asked by At

Let's assume I have a dataframe D:

D = pd.DataFrame({'a': [1,2,3], 'b': [3,3,4], 'c': [2, 4, 0], 'd': [-1,-1,-1]})
D

    a   b   c   d
0   1   3   2   -1
1   2   3   4   -1
2   3   4   0   -1

I want to sort the values in columns a, b, c in, say, first two rows, so the expected result would be

    a   b   c   d
0   1   2   3   -1
1   2   3   4   -1
2   3   4   0   -1

To avoid chained assignment warning I try to do it with df.loc:

D.loc[[0,1], ['a', 'b', 'c']].agg(['min', 'median', 'max'], axis=1)

    min median  max
0   1.0 2.0 3.0
1   2.0 3.0 4.0
D.loc[[0,1], ['a', 'b', 'c']] = D.loc[[0,1], ['a', 'b', 'c']].agg(['min', 'median', 'max'], axis=1)
D

    a   b   c   d
0   NaN NaN NaN -1
1   NaN NaN NaN -1
2   3.0 4.0 0.0 -1

For some reason the values don't set. I thought that it could because of different column names but D.loc[[0,1], ['a', 'b', 'c']] = D.loc[[0,1], ['a', 'b', 'c']].agg(['min', 'median', 'max'], axis=1).rename(columns={"min": "a", "median": "b", "max": "c"}) and D.loc[[0,1], ['a', 'b', 'c']] = D.loc[[0,1], ['a', 'b', 'c']].agg(['min', 'median', 'max'], axis=1).to_numpy() didn't work too. Changing the values manually isn't an option because in the real example I have about 800k rows instead of 3 and want to keep the operation vectorised. So what should the correct way of doing it?

2

There are 2 best solutions below

0
On BEST ANSWER

Use numpy.sort:

D.loc[[0,1], ['a', 'b', 'c']] = np.sort(D.loc[[0,1], ['a', 'b', 'c']], axis=1)
print (D)
   a  b  c  d
0  1  2  3 -1
1  2  3  4 -1
2  3  4  0 -1

If need assign aggregate function convert ouput to numpy array by DataFrame.to_numpy:

D.loc[[0,1], ['a', 'b', 'c']] = D.loc[[0,1], ['a', 'b', 'c']].agg(['min', 'median', 'max'], axis=1).to_numpy()
print (D)
   a  b  c  d
0  1  2  3 -1
1  2  3  4 -1
2  3  4  0 -1
0
On

To fix you your example, use explicit conversion to_numpy:

cols = ['a', 'b', 'c']
D.loc[0:1, cols] = D.loc[0:1, cols].agg(['min', 'median', 'max'], axis=1).to_numpy()

If you want to sort, use numpy.sort that will convert to array:

cols = ['a', 'b', 'c']
D.loc[0:1, cols] = np.sort(D.loc[0:1, cols], axis=1)

Output:

   a  b  c  d
0  1  2  3 -1
1  2  3  4 -1
2  3  4  0 -1