Slicing MultiIndex Pandas Dataframe with integer values incorrect?

57 Views Asked by At

We have a MultiIndex DataFrame where the top-level index uses integer values. Slicing for a specific value returns all index values up to the requested value, not just the requested value. Is this a bug, or are we doing it wrong?

Example:

import numpy as np
import pandas as pd
midx = pd.MultiIndex.from_product([[1,2], ['A', 'B']])
df = pd.DataFrame(np.arange(4).reshape((len(midx), 1)), index=midx, columns=['Values'])

df.loc[(slice(1), slice(None)), :]  # Slice for only top index value=1

This first slice returns just the index values = 1, as expected:

        Values
1   A   0
1   B   1

But:

df.loc[(slice(2), slice(None)), :]  # Slice for only top index value=2

returns index value 1 as well as value 2, like this:

        Values
1   A   0
1   B   1
2   A   2
2   B   3

where we expect this:

        Values
2   A   2
2   B   3
3

There are 3 best solutions below

0
Nick On BEST ANSWER

When you call slice(x), x is the stop value (see the manual); so it will return everything up and including that value. In your case you can simply supply the desired index directly:

df.loc[(2, slice(None)), :]

Output:

     Values
2 A       2
  B       3

Note that in calls to .loc, slice end values are inclusive; see the manual and this Q&A.

0
Psidom On

You need slice(2, 2) to extract the 2nd index, when providing only one parameter, it's treated as stop parameter so everything before that index is returned:

df.loc[(slice(2, 2), slice(None)), :]
#     Values
#2 A       2
#  B       3
0
Scott Boston On

You can also use pd.Dataframe.xs like this:

df.xs(2, level=0, drop_level=False)

Output:

     Values
2 A       2
  B       3