Iterator and Iterable

引言

unfortunately, the versions  of Setence below  are bad ideas and not pythonic.

实际工作中,要从一个英文句子中逐步迭代处其中的单词,代码可以这样写:

import re
import reprlib

RE_WORD = re.compile("\w+")
class Sentence(object):
    def __init__(self, text):
        self.words = RE_WORD.findall(text)

    def __getitem__(self, idx):
        return self.words[idx]

    def __len__(self):
        return len(self.words)

    def __repr__(self):
        return "Sentence ({})".format(reprlib.repr(self.text))

Iterable 对象的定义

an object is considered iterable if it implements the __iter__ method 
任何实现了__iter__方法的对象都是Iterable的。
比如:
class Foo(object):
    """docstring for Foo"""
    def __iter__(self):
        pass

>>> from collections import abc
>>> issubclass(Foo, abc.Iterable)
True
>>> isinstance(Foo(), abc.Iterable)
True

 However, the most accurate way the check whether an object is iterable is to call iter(x) and handle the TypeError
exception if it isn't. This is more accurate than using isinstance(x, abc.Iterable), because iter(x) also considers
the legacy __getitem__ method, while the Iterable ABC does not.
但是,最准确的方法,判断一个对象是不是Iterable的,是利用try except语句。调用iter(anObject),并处理TypeError异常。
这是因为,有些对象实现了__getitem__方法后是可迭代的,如引言中的例子。

可以看出:对于对象x,调用iter(x)如果返回的是iterator,那么x是Iterable的。否则,引起TypeError异常,说明x不是Iterable的。

具体而言:Objects implements   an __iter__ method returning an iterator are iterable. (实现了__iter__方法并返回一个迭代器的对象是可迭代的)

Sequences are always iterable(序列都是可迭代的,因为它们都实现了__getitem__方法); 

as are object implements a __getitem__ method that takes 0-based indexes.(任何实现了__getitem__方法,并从0开始索引的对象是可迭代的。)

字典也是可迭代的,因为python3中字典的keys是可迭代的。

for循环的内部机制

这是由于对象的可迭代特性,我们可以方便的使用for循环。

用代码来说, for循环的内部机制其实包括了一个while循环和try...except...异常处理。

具体如下:

# the for manchinery by hand with a while  loop.
>>> s = 'ABC'
>>> for e in s:
        print(e)
that is like:
>>> s = 'ABC'
>>> it = iter(s)
>>> while True:
        try:
            print(next(it))
        except StopIteration, e:
            del it                 # decreace the reference by 1
            break

Iterable 与 Iterator 的关系


Python obtains iterators from iterables.

in other words, iterable builds iterator.

比如,sequences都是Iterable的,python解释器依靠内置函数iter(), 获得sequence的iterator。

正如上面的 iter(s)。


标准迭代器的接口

python中,标准的迭代器接口有二:

1. __next__方法。return the next available item, raisingStopIteration when there are no more items.

2. __iter__方法。 return self; this allows iterators to be used where an iterable is expected, for example, in a for loop.


abc.Iterator 内部机制


# abc.Iterator class. __file__ = 'Lib/_collections_abc.py'

class Iterator(Iterable):
    slots = () # can not be used as an instance
    
    def __iter__(self):
        return self
    
    @abstactmethod
    def __next__(self):
        raise StopIteration

    @clasmethod
    def __subclasshook__(cls, C):
        if cls is Iterator:
            if any("__next__" in B.__dict__ for B in C.__mro__) and any("__iter__" in B.__dict__ for B in C.__mro__):
                return True
        return NotImplemented

__subclasshook__ 正是isinstance, issubclass 调用时判断的依据。

很明显,如果一个对象没有实现,__next__和__iter__方法,那么它一定不是abc.Iterator的实例或子类。

这也是为什么不用abc.Iterator判断一个对象是不是可以迭代的原因:首先,它主要判断是不是Iterator而不是Iterable, 其次,它判断的标准为:必须同时具备__iter__,__next__。

当然,使用abc.Iterator判断一个对象是不是迭代器更为合适。

Iterators in Python aren't a matter of type but of protocol. 

A large and changing number of builtin types implement *some* flavor of iterator.

Don't check the type! Use hasattr to check for both "__iter__" and "__next__" attributes instead.

In fact, that's exactly what the __subclasshook__ method of the abc.Iterator ABC does.

The best way to check if an object x is an iterator is to call isinstance(x, abc.Iterator)

Thanks to Iterator.__subclasshook__, this test works even if the class of x is not a real or virtual subclass of Iterator.

如何重置一个迭代器Iterator

Because the only methods required of an iterator are __next__ and __iter__, there is no way to check whether there are remaining items, other than to call next() and catch StopIteration. 

由于迭代器中必备方法是__next__和__iter__,没有办法查看迭代器中是不是还有剩余元素,只能调用next()函数直到抛出StopIteration异常。

Also, it's no possible to 'reset' an iterator.

同样的,没有办法重置迭代器。一旦开始迭代就不能后退。

if you need to start over, you need to call iter( ) on the iterable that built the iterator in the first place.

当然,如果你需要从新开始,应该使用iter()函数作用于iterable对象重新创建迭代器。

calling iter() on the original iterator itself won't help, because __iter__ return self.

如果,错误的将iter()函数作用于原来的迭代器,这并无卵用,因为原来的迭代器中的__iter__方法return self.

迭代器定义

Any object that implement the __next__ no-argument method that returns the next item in a series or raise StopIteration when there are no more items.

实现__next__无参函数的对象都是迭代器,__next__方法中总是返回下一个元素或是在没有下一个元素时抛出异常。

Python iterators also implement the __iter__ method so they are iterable as well.

python中的迭代器同样实现了__iter__方法,使得迭代器也是Iterable的。


再论Iterable和Iterator

class Setence(object):
    def __init__(self, text):
        self.text = text
        self.words = RE_WORD.findall(text)

    def __iter__(self):
        return SentenceIterator(self.words)
    def __repr__(self):
        return "Setence ({})".format(reprlib.repr(self.words))

class SentenceIterator(object):
    def __init__(self, words):
        self.words = words
        self.idx = 0

    def __iter__(self):
        return self

    def __next__(self):
        try:
            word = self.words[self.idx]
        except IndexError:
            raise StopIteration
        self.idx += 1
        return word
# this version has no __getitem__,  to make it clear that the class is iterable because it implements __iter__

Note that implementing the __iter__ method in SentenceIterator is not actually needed for this example to work, but the it's the right thing todo:

针对上面的例子而言,从完整地实现代码功能角度来看, __iter__方法并不是必须的,但是实现__iter__方法有以下两点好处:

    iterators are supposed to implement both __next__ and __iter__, and doing so makes our iterator 

    iterator 需要两个方法作为接口,完整实现这两个方法的对象才是迭代器。
    
    pass the issubclass(SentenceIterator, abc.Iterator) test.

    完整实现这两个方法可以使迭代器通过issubclass(SentenceIterator, abc.Iterator)测试。



A common cause of errors in building iterables and iterators is to confuse the two.

创建 iterables 和 iterators由于混淆这两种概念经常产生bug

To be clear:

要搞清楚:

iterables have an __iter__ method that instantiates a new iterator every time.

iterable 对象中有__iter__方法,这个方法初始化一个迭代器并返回。特殊的实现了__getitem__方法的对象,是通过iter()函数初始化一个迭代器并返回。

Iterators implement a __next__ method that returns individual items, and an __iter__ method that returns self.

Iterator 实现__next__方法返回下一个元素,实现__iter__方法返回本身。

Therefore, iterators are also iterable, but iterables are not iterators.

所以, iterators 也是 iterables, 但是Iterables不一定是Iterators.

It may be tempting to add __next__ methos in the Sentence class, making Setence instance at the same time an iterable and iterator over itself. But this is a terrible idea.

也许你也想在Sentence对象中实现__next__方法,使得Setence实例既是Iterable也是Iterator。但是,这是个可怕的想法

it must be possible to obtain multiple independent iterators from the same iterable instance, 

因为, 从一个Iterable对象中应该能够获得多个独立的Iterator。

and each iterator must keep its own internal state, 

并且,每个Iterator保持着自己的内部状态

so a proper implementation of the pattern requires each call to iter(my_iterable) to create a new, independent, iterator.

所以一个恰当的模式是用iter(my_iterable)得到一个全新的独立的Iterator。

That is why we need the SentenceIterator class in this example.

这就是我们还需要实现 SentenceIterator对象的原因。


结论


An iterable should never act as an iterator over itself.

Iterable永远不能有类似Iterator的行为。

In other words, iterable should implement __iter__ method, but not __next__.

换而言之,iterable 应该实现__iter__方法,不实现__next__方法。

On the other hand, for convenience, iterators should be iterable.

另一方面, 为了方便, Iterator应该是iterable的。

An iterator's __iter__ should just return self.

Iterator的__iter__方法应该只 return self.