I am trying to understand how Numpy indexing work.
Take a look at the following code:
A = np.array([
[10, 7, 3, 100],
[1, 7, 0, 3],
])
B = np.argsort(A, axis=1)
# B = [
# [2 1 0 3]
# [2 0 3 1]
# ]
A[B]
Based on pure logic if I first get the sorted indexes of A using argsort I should then be able to index A using ITS OWN sorted indexes out of the box without doing any extra work. Instead I get the folliwing error:
IndexError: index 2 is out of bounds for axis 0 with size 2
Why is this? Can someone explain how this type of indexing work in Numpy?
Even more strange is the following code:
A = np.array([
[10, 7, 3, 100],
[1, 7, 0, 3],
])
B = np.argsort(A, axis=1)
# B = [
# [2 1 0 3]
# [2 0 3 1]
# ]
C = np.array(['A', 'B', 'C', 'D'])
C[B] # It works
The code above works fine. I thought this was working because of broadcasting:
B: (2, 4)
C: (4, )
Left alignment of C:
C: (1, 4)
Repetition of C:
C: (4, 4)
But now we are basically back at the first case... So why does this work?
You've correctly obtained the sorted indices of array A along the rows (axis 1) using argsort. So, B is a 2D array containing the sorted indices. However, when you try to use these indices to index array A, numpy interprets them as indexing the rows of A, not the elements within each row.
To achieve what you want, you need to perform advanced indexing on both dimensions of the array. You can achieve this by using the np.indices() function to generate index arrays for both row and column positions, and then use them to index A with B: