Existing Approaches to Structural Subtyping
Abstract classes defined in
collections.abcmodule are slightly more advanced since they implement a custom__subclasshook__()method that allows runtime structural checks without explicit registration:
from collections.abc import Iterable
class MyIterable:
def __iter__(self):
return []
assert isinstance(MyIterable(), Iterable)
But Python glossary: Iterable:
An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as list, str, and tuple) and some non-sequence types like dict, file objects, and objects of any classes you define with an
__iter__()method or with a__getitem__()method that implements Sequence semantics.
"or with a __getitem__()"
So I expect that this code run without any AssertionError:
from collections.abc import Iterable
class MyIterable:
def __getitem__(self, item):
return []
assert isinstance(MyIterable(), Iterable)
But it doesn't:
Traceback (most recent call last):
File "file.py", line 7, in <module>
assert isinstance(MyIterable(), Iterable)
AssertionError
So why, even if an Iterable should implement __iter__ OR __getitem__, __getitem__ doesn't works if we want to check if it's an Iterable.
I also tested with Mypy:
from collections.abc import Iterable
class MyIterable1:
def __iter__(self):
return []
class MyIterable2:
def __getitem__(self):
return []
def foo(bar: Iterable):
...
foo(MyIterable1())
foo(MyIterable2())
Type check result:
$ mypy .\scratch_443.py
test_file.py:15: error: Argument 1 to "foo" has incompatible type "MyIterable2"; expected "Iterable[Any]"
Found 1 error in 1 file (checked 1 source file)
While you did cite most of the relevant passages, I would like to add a little bit of additional context and another perspective.
The problem lies (as it often does) in the definitions, of which there are two in this case.
The abstract base
IterableThe
collections.abc.Iterableis not flawed, it just leans on a more narrow definition of the term. In that definition, if a class implements the__iter__method, it is considered iterable; plain and simple. Mind you, this does not (and can not) impose any constraints on what happens inside that method or what it returns.One of the consequences of this is that technically the method could return something silly, like an integer for example, even though we would reasonably expect the
__iter__method to always return an iterator (i.e. something implementing the__next__method).Case in point:
The error is only raised inside the
iterfunction, as it presumably checks the existence of__next__in the__dict__of the class (!) of the provided object.This last point is tangential, but still relevant to the discussion IMO.
The loose term "iterable"
The term "iterable" is defined more broadly in the glossary as an object whose class corresponds to the aforementioned
Iterableprotocol or, as you quoted,And you'll notice I highlighted that last portion of the sentence. This part is actually important to understanding the problem at hand. This is unfortunately not expanded on further in the glossary, but if we take a look at the documentation for the built-in
iter(), which is (as the docs tell us) the only reliable way of checking, if an object is iterable, we find the following clarification. It says the argumentThis qualification is important because simply having the
__getitem__method does not constitute aSequence. It is a necessary but not sufficient requirement, as e.g. theMappingprotocol also requires the__getitem__method to be implemented, but neither of those two is a subclass of the other (as you can see here).__getitem__merely allows subscripting an instance with akey(i.e. using square brackets[key]with them) and the sequence protocol requires an acceptedkeyto be an integer (or slice).Why is this relevant?
Because while we can know if an object's class implements
__getitem__, it is impossible to know from the outside how it implements it. ASequencesubtype should raise an error, if we were to try and call its__getitem__with a string for example. But how con we know that it does? Only by calling it.And since specifically the sequence protocol (and not just any
__getitem__method) is what constitutes an "iterable" in the absence of__iter__in this broader sense, there is no way to determine, if a class should or should not be considered iterable.To top this all off, consider the following example:
I would argue that
Baris a perfectly valid (albeit not very useful) example of a subscriptable class. An instance even passes theiter()check! Yet should it be considered an iterable? Both the documentation and common sense say no.Conclusion
Determining whether or not something is "iterable" comes down to what you mean by the term. And I would argue that (if anything) the documentation suggesting that the
iter()is reliable in this regard is misleading. The simple subclass check with the ABCIterableis not sufficient, if you consider the sequence protocol to also be a reasonable version of an iterable.IMHO, the only actually reliable way of determining if an object is iterable is to chain a
next()call with aniter()call, which in practice amounts to a plainfor-loop. If that raises an error, the object is not iterable.Final example:
Output:
Related
How to check a class/type is iterable (uninstantiated)