I have the following DataFrames/Series exhibiting very surprising [] slicing behaviour:
# slicing by integers force iloc-like indexing even if index is integer
In [1]: pd.DataFrame({"a": {3: 1, 1: 2}})["a"][2:]
Out[1]: Series([], Name: a, dtype: int64)
# slicing by index element uses sort order of index,
# and in this case dict insertion order is NOT respected
In [2]: pd.DataFrame({"a": {"d": 1, "b": 2}})["a"]["c":]
Out[2]:
d 1
Name: a, dtype: int64
# if index is not sorted,
# slicing by index element that is not present
# should trigger an exception
In [3]: pd.DataFrame({"a": [1, 2]}, index=["d", "b"])["a"]["c":]
Out[3]:
b 2
Name: a, dtype: int64
Isn't the last one a bug in Pandas as it is supposed to trigger an Exception?
Moral of the story: never use [] on a DataFrame or Series, especially with slices...
Maybe you're overlooking the differences in two types of selection supported in Pandas:
Selection by position: works like a regular integer-based indexing. When you select with
ilocor simply withSeries[:2](integer index), this will be used. Read more here.Selection by label: if the index is sorted, Pandas will include in the slice anything that is between the start and stop labels, and exclude anything that is not. When you select with
locorSeries['c':](label index), this will be used. Read more here.Your first example:
[2:].2in a zero-index based array. Nothing is returned, since the index only has 2 elements.Compare this with selecting from position
1:Your second and last examples (they both give the same result to me):
'c', using["c":].'d'and'b'are ordered in decreasing order, this will select anything beginning at label'c'(including) up until the end of the index, which is'b':Compare this with an unordered index:
This will raise a
KeyError: 'c'.