A collection of computer, gaming and general nerdy things.

Tuesday, September 30, 2014

Iteration 1: The basics

In [1]:
for x in [1,2,3]:
    print(x, end=' ')
1 2 3 

Iteration is something that is used all the time in programming. Python makes it really easy to use.

Parts of Iteration

There are a couple of parts to iteration, consider this a simple glossary for terms used in this post:

  • The Consumer: something that uses iteration to get values from an object: for ...for example
  • The Iteration: act of iterating over an object implementing the iter protocol
  • The Iter Protocol: interface that iteration uses, requires both __iter__ and __next__ to be defined
  • The Iterable: container object that implements __iter__, which in turns returns an iterator
  • The Iterator: object that implements __next__, which in turns returns value to the consumer

Consumers can be anything: for, iter creates an iterable out of a container. The builtin next handles manual advancement over an iterator. Whatever they are, they use iteration to get values from an object. This is the simplest part of the question of iteration.

Iter Protocol, Iterables and Iterators

This part is more complex because there's multiple ways to approach implementing iteration. The iter protocol says that two methods must be implemented:

  • __iter__: On the container this must return an object that implements __next__; on the iterator, it must return it's instance
  • __next__: This only has to be implemented on the iterator, this method is used to return values to the consumer. When no values are left to return, this method must raise StopIteration and continue to do so on each sequential call.

A common way of handling iteration is two split the protocol over two objects. list does this.

In [2]:
print('[] and iter([]) are different:', type([]) != type(iter([])))
print('iter([]) and iter(iter([])) are the same:', type(iter([])) == type(iter(iter([]))))
print('typeof iter([]):', type(iter([])))
print('[] hasattr __iter__:', hasattr([], '__iter__'))
print('[] hasattr __next__:', hasattr([], '__next__'))
print('iter([]) hasattr __iter__:', hasattr(iter([]), '__iter__'))
print('iter([]) hasattr __next__:', hasattr(iter([]), '__next__'))
[] and iter([]) are different: True
iter([]) and iter(iter([])) are the same: True
typeof iter([]): <class 'list_iterator'>
[] hasattr __iter__: True
[] hasattr __next__: False
iter([]) hasattr __iter__: True
iter([]) hasattr __next__: True

It's very common for an iterable and an iterator to be different; however, it's not uncommon for them to be the same object (as defined, every iterator is both an iterable and it's iterator).

Defining our own iterators

Python makes it super easy to define our own iteratables and iterators. Just implement a couple of methods and you're done!

In [3]:
class MyIterable:
    '''Simple implementation of the iter protocol
    __iter__ returns an instance of MyIterator
    '''
    def __init__(self, upper=0):
        self.upper = upper
    
    def __iter__(self):
        return MyIterator(upper=self.upper)

class MyIterator:
    '''Simple implementaion of the iter protocol
    When iterated, incremental numbers are returned
    until the upper bound is reached.
    '''
    
    def __init__(self, upper=0):
        self.upper = upper
        self.__current = 0
    
    def __iter__(self):
        return self

    def __next__(self):
        if self.__current >= self.upper:
            raise StopIteration("Upper bound reached.")
        else:
            self.__current += 1
            return self.__current - 1

for x in MyIterable(4):
    print(x)
0
1
2
3

The advantage to splitting the iterable and the iterator into two classes is that you can maintain multiple iterators all at different states. If an object handled it's own iteration, you can maintain only one state.

In [4]:
test = MyIterator(4)

for x in test:
    print(x, end=' ')

for x in test:
    print(x, end=' ')

test = MyIterable(4)
it = iter(test)
print('\n', next(it), sep='')

for x in test:
    print(x, end=' ')

print('\n', next(it), sep='')
print(next(it))
print(next(it))
0 1 2 3 
0
0 1 2 3 
1
2
3

However, something that's a little sneaky is returning a something else completely from the __iter__ so long as it is an iterator.

In [5]:
class SneakyIter(MyIterable):
    
    def __iter__(self):
        return iter(range(self.upper))

for x in SneakyIter(4):
    print(x, end=' ')
0 1 2 3 

Returning just range wouldn't work because range is iterable but it's not an iterator.

Iters huh what are they good for?

Well, a lot more than just counting. To borrow an example from Dave Beazly's fantastic Python Cookbook, 3rd Edition (I seriously cannot recommend this book enough), traversing nodes.

In [6]:
class Node:
    def __init__(self, value):
        self._value = value
        self._children = []

    def __repr__(self):
        return 'Node(%r)' % self._value

    def add_child(self, other_node):
        self._children.append(other_node)
 
    def __iter__(self):
        return iter(self._children)

    def depth_first(self):
        return DepthFirstIterator(self)

class DepthFirstIterator(object):
    '''
    Depth-first traversal
    '''
    def __init__(self, start_node):
        self._node = start_node
        self._children_iter = None
        self._child_iter = None

    def __iter__(self):
        return self

    def __next__(self):
        # Return myself if just started. Create an iterator for children
        if self._children_iter is None:
            self._children_iter = iter(self._node)
            return self._node

        # If processing a child, return its next item
        elif self._child_iter:
            try:
                nextchild = next(self._child_iter)
                return nextchild
            except StopIteration:
                self._child_iter = None
                return next(self)

        # Advance to the next child and start its iteration
        else:
            self._child_iter = next(self._children_iter).depth_first()
            return next(self)
        
root = Node(0)
child1 = Node(1)
child2 = Node(2)
root.add_child(child1)
root.add_child(child2)
child1.add_child(Node(3))
child1.add_child(Node(4))
child2.add_child(Node(5))

for ch in root.depth_first():
    print(ch, end=' ')

print('\n')
for ch in root:
    print(ch, end=' ')
Node(0) Node(1) Node(3) Node(4) Node(2) Node(5) 

Node(1) Node(2) 

Iteration shortcuts, tips and tricks

There are shortcuts to iteration, the most commonly known are comphrensions. Comphrensions are types of literals that do more than simply create, say, a list.

In [7]:
from string import ascii_lowercase as lowercase

# list comp.
test = [ord(c) for c in lowercase]
print(test)

# dict comp.
test = {c:ord(c) for c in lowercase}
print(test)

#you can use comps. in place of iterables in function arguments
print(sum([ord(c) for c in lowercase]))

#something else you can do is drop the brackets if the comp is the only argument
print(sum(ord(c) for c in lowercase))
[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122]
{'x': 120, 'i': 105, 'n': 110, 'c': 99, 'b': 98, 'h': 104, 's': 115, 'j': 106, 'p': 112, 'f': 102, 'e': 101, 'a': 97, 'r': 114, 'd': 100, 'u': 117, 'g': 103, 'l': 108, 'k': 107, 'w': 119, 'o': 111, 'y': 121, 'm': 109, 'q': 113, 't': 116, 'v': 118, 'z': 122}
2847
2847

But there are more of manipulating iters, such as the builtin filter and map, the itertools module (which all return a special type of iterable called a generator, but that's another post, just know they aren't lists). But the use of filter and map are lessened some by list comps, which I often find easier to write. Some would argue that comphrensions should be used for creating new objects, whereas map, filter, et. al. should be used for manipulating existing objects. However, the important thing is consistency: if you use filter and map all over the place, don't suddenly throw a comphrension that transforms an existing structure.

In [8]:
filt = filter(lambda c: not ord(c) % 3, lowercase)
filt = list(filt) # transform filt into a list
comp = [c for c in lowercase if not ord(c)%3]
print(filt, comp, sep='\n')

mapd = map(ord, lowercase)
mapd = list(mapd)
comp = [ord(c) for c in lowercase]
print(mapd, comp, sep='\n')

mapfilt = map(ord, filter(lambda c: not ord(c) % 3, lowercase))
mapfilt = list(mapfilt)
comphre = [ord(c) for c in lowercase if not ord(c)%3]
print(mapfilt, comphre, sep='\n')
['c', 'f', 'i', 'l', 'o', 'r', 'u', 'x']
['c', 'f', 'i', 'l', 'o', 'r', 'u', 'x']
[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122]
[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122]
[99, 102, 105, 108, 111, 114, 117, 120]
[99, 102, 105, 108, 111, 114, 117, 120]

There's also unpacking, which is a great tool for pulling items out of an iterable without using a regular form of iteration. The way it works is that Python transparently creates a tuple -- if you didn't know the tuple operator is the ,, not (); parens are only needed if the tuple is not the only argument in a function -- and then unpacking the tuple into the curret namespace. Seems complicated, but it really isn't.

In [9]:
a, b, c = lowercase[:3]
print(a, b, c)

# tuple unpacking is also useful for variable switching
a, b, c = c, b, a
print(a, b, c)

# similar to how we can splat a list into a function, 
# we can splat iterables into unpacking
# _ becomes a list
a, *_, z = lowercase
print(a, z, _)
a b c
c b a
a z ['b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']

What's next?

Well, there are other types of iterators called generators which iterate in a peculiar way. And without having a base understanding of them, understanding the real benefits of iters and itertools is difficult.

But other than that, go and iterate all the things. But just remember to iterate responsibly.

Resources

No comments:

Post a Comment