Pandas slicing by index inconsistency

61 Views Asked by user1537366 At 07 March 2024 at 10:31

I have the following DataFrames/Series exhibiting very surprising [] slicing behaviour:

# slicing by integers force iloc-like indexing even if index is integer
In [1]: pd.DataFrame({"a": {3: 1, 1: 2}})["a"][2:]
Out[1]: Series([], Name: a, dtype: int64)

# slicing by index element uses sort order of index,
# and in this case dict insertion order is NOT respected
In [2]: pd.DataFrame({"a": {"d": 1, "b": 2}})["a"]["c":]
Out[2]: 
d    1
Name: a, dtype: int64

# if index is not sorted,
# slicing by index element that is not present
# should trigger an exception
In [3]: pd.DataFrame({"a": [1, 2]}, index=["d", "b"])["a"]["c":]
Out[3]: 
b    2
Name: a, dtype: int64

Isn't the last one a bug in Pandas as it is supposed to trigger an Exception?

Moral of the story: never use [] on a DataFrame or Series, especially with slices...

Original Q&A

There are 1 best solutions below

e-motta On 07 March 2024 at 12:33 BEST ANSWER

Maybe you're overlooking the differences in two types of selection supported in Pandas:

Selection by position: works like a regular integer-based indexing. When you select with iloc or simply with Series[:2] (integer index), this will be used. Read more here.
Selection by label: if the index is sorted, Pandas will include in the slice anything that is between the start and stop labels, and exclude anything that is not. When you select with loc or Series['c':] (label index), this will be used. Read more here.

Your first example:

pd.DataFrame({"a": {3: 1, 1: 2}})["a"][2:]

You select using [2:].
This will select anything starting at position 2 in a zero-index based array. Nothing is returned, since the index only has 2 elements.

Series([], Name: a, dtype: int64)

Compare this with selecting from position 1:

pd.DataFrame({"a": {3: 1, 1: 2}})["a"].iloc[1:]

1    2
Name: a, dtype: int64

Your second and last examples (they both give the same result to me):

pd.DataFrame({"a": [1, 2]}, index=["d", "b"])["a"]["c":]

You select a slice beginning at label 'c', using ["c":].
Since 'd' and 'b' are ordered in decreasing order, this will select anything beginning at label 'c' (including) up until the end of the index, which is 'b':

b    2
Name: a, dtype: int64

Compare this with an unordered index:

pd.DataFrame({"a": [0, 1, 2]}, index=["a", "d", "b"])["a"]["c":]

This will raise a KeyError: 'c'.

Pandas slicing by index inconsistency

There are 1 best solutions below

Related Questions in PANDAS

Related Questions in INDEXING

Related Questions in SLICE

Trending Questions

Popular # Hahtags

Popular Questions