A collection of computer, gaming and general nerdy things.

Sunday, October 12, 2014

Holy crap it's live: JSONConfigParser 0.1.0

It only took about two hours of fiddle faddling and there's probably more commits from trying to get it to do the thing than anything else, but I finally pushed my first package to PyPi. :) I still need to add

jsonconfigparser 0.1.0 [pypi] [github]

Technically it's 0.1.1 but that's only because I didn't know about MANIFEST.in Pull requests, patches and feature requests are all more than welcome!
I suppose I should take the time to describe it a little. The tag line is "Quick and easy editting of JSON files." which is probably a bit of a misnomer but it beats editting files by hand, in my opinion. There's three ways of interacting with it:
  • CLI through your shell
  • An interactive prompt
  • Through Python itself

CLI Implementation

Getting here is as simple as typing jsonconf. It expects two positional arguments:
  • The first is a path to a json file.
  • The second is the action you want to take on the file. They're detailed on the github and pypi pages but I'll go over the actions and flags a little.

Actions

These are the main actions used by the command line interface
  • addfile:
    • Uses the -o/--other flag to update the specified json file with another. Using this command will overwrite existing keys with the values in the appended file if there are shared keys.
    • Ex: jsonconf path/to/conf.json addfile -o path/to/other.json
  • addfield:
    • Adds a field to the specified json file at a specified end point, optionally converts the value to a non-string type
    • Ex: jsonconf path/to/conf.json addfield -p $.age -v 25 -c int
  • append:
    • Adds a value to a JSON collection at the specified endpoint, optionally converts the value to a non-string type, optionally affects every found path found.
    • Ex: jsonconf path/to/conf.json append -p $.packages -v jsonconfigparser
    • Ex: jsonconf path/to/conf.json append -p$.authors.[*].name -v "Fred Arminsen" -m
    • Ex: jsonconf path/to/conf.json append -p $.ids -v 113 -c int
  • delete:
    • Deletes the specified endpoint, the path is optional here but if it's not passed, it deletes the whole document
    • Ex: jsonconf path/to/conf.json delete
    • Ex: jsonconf path/to/conf.json delete $.age
  • edit:
    • Changes the value at the specified endpoint. Optionally convert the value to a nonstring type
    • Ex: jsonconf path/to/conf.json edit -p $.age -v 22 -c int
  • shell:
    • Drops you into an interactive prompt working on the specified JSON file
    • Ex: jsonconf path/to/conf.json shell

Flags

These are the flags used by the command line interface
  • -p/--path: The path flag, this is a JSONpath path that the script will use to walk the JSON representation.
  • -o/--other: Used with the addfile command to specify a second JSON file to update the current on with.
  • -v/--value: Used to denote the value or values being passed to the underlying command.
  • -m/--multi: Used with the append command to specify that multiple paths should be written to, defaults to False.
  • -c/--convert: A string used to denote how to convert the passed in value to it's final form. Acceptable values to pass to -c are any of str, bool, int, float, list, dict. You can also compose these values into a composite type as simple or complex as needed.

Interactive Prompt

Using the prompt is largely the same as the CLI. All the commands and flags are the same, however, all actions are applied to the file specified when the prompt was launched. There is also the write command avaliable (which is also technically available on the CLI, too, but the CLI autosaves) to save the current status of the document.

About -c/--convert and -v/--value

-c takes a space delimited string and uses it to build a converter out of it. This converter can be as simple as just int or as complex as dict int dict int list which would be a dictionary with integer keys and values that are sub dictionaries with integer keys and lists for values. Of course with increasingly complex datatypes comes increasing complicated input. For that last example, the input might look like: 4=4=value Which is confusing to look at until you break it down.
  • -c list will convert a space delimited string into a list of values using Python's shlex.split function. This is smart enough to know that values encased in subquotes are to be considered one value. -c list -v "1 2 '3 4'" would output ['1', '2', '3 4'] in Python.
  • -c dict will convert a space delimited string into sub strings delimited by = and then into dictionaries. -c dict -v "key=value other=something" creates {"key":"value", "other":"something"} inside Python.
  • There is also -c bool which will return True for everything other than False (regardless of case), 0 and 0.0
The real power lies in stringing together the data types. If you need a list of integers, you'd specify -c "list int". If you need a dictionary with integer keys and list values, you'd specify -c "dict int list". But -c is also pretty smart, if you need a dictionary with sub dictionary values, you might think, -c "dict str dict" which would work, but you can simply use -c "dict dict" because -c knows that lists and dictionaries can't be used as keys, it doesn't even bother. It's also smart enough to know what -c wut means so it returns strings and that -c int float doesn't make sense so it just returns strings. Actually, I might alias -c to --PHP in the next release because when faced with something that maybe throwing an error is a good idea, it'll return strings.
The basic grammar of -c is this:
  • Do I know what all the types are? If no, return string handler. Else:
  • If there's just one type, return it. Else:
  • If the first type isn't a list or dict, return a string handler because -c doesn't know what to do. Else:
    • If the first type is a list, build a secondary converter from the remaining parts of the original converter. Else:
    • If the frist type is a dict and there's only one following type or the next type is a list or dict, build a converter that will only manipulate the values of the dictionary from the rest of the original converter. Else:
    • If the first type is a dict and there's multiple remaining types and the next one is not a list or dict, build a converter that will manipulate both the key and the value from the remaining types. The next immediate type becomes the key type and the remaining types become the value type.
As you can imagine, this is recursive so something like dict int list float is easily constructed and the value can be passed as -v "0='1 2 3'" since Python's sh.lex intelligently groups matching quotes from the outside in.

Bugs and explosions

Due to the way -c is parsed it can be buggy at times, if you see something, say something. A copy of your conversion list and your value entry would be extremely helpful as I can't cover all possible permutations someone might use.
-c also currently makes no qualms about blowing up if you try to feed 3 4 through int either.

Interacting through Python

You can also import the module into your script to interact with it. The conversion sequences aren't really needed here since you can specify the exact type you'd like use instead of having to bend your mind around converting a flat value into nested dictionaries. The biggest tools here are JSONConfigParser and the commands module. The commands are largely the same, although in the Python API addfile and addfield become add_file and add_field respectively.

Adding custom types

Add custom scalar types (most likely scalar type transformations) is as easy as importing jsonconfigparser.utils.fieldtypes and updating the dictionary with your own mapping. For example, if you needed to have every value in a list upper case, you might add {'upper' : str.upper}. You can also override any existing converters as well as long as you support the expected input (or don't, that's your call). Of course, you'll need to ensure that your type modifications get added to the runtime of the shell prompt.

Adding custom commands

Adding custom commands is as simple as decoratoring a function with jsonconfigparser.command. Currently, there's not an easy way to add arguments to the command line and shell prompt and the decorator scans the function argument signature for named arguments (i.e. everything that's not *args/**kwargs) and the calling function attempts to pull them from the argparser object (or simple storage object for the prompt). Both of these need to be extended and possibly combined so adding arguments is easy as well.

Questions, Comments and Criticisms

Are always welcome. My email is available here. There's the github issue tracker. I'm not hard to track down, despite what some may lead you to believe. Let me know what you think about it all.

Thursday, October 9, 2014

Code Reuse in multiple forms

Reusing Code, or: How I Learned to Stop Repeating Myself

One of the best things about coding is not having to do the same thing over and over again. You automate. You work things into functions and objects and have them worry about completing a series of actions for you. Why wouldn't you do the same thing when actually writing code?

There are times where you find yourself repeating code; when this happens, you should consider if it's possible to refactor and break the issue into a reuable piece of code. Generally, the rule of three comes in play:

There are two "rules of three" in [software] reuse:

* It is three times as difficult to build reusable components as single use components, and
* a reusable component should be tried out in three different applications before it will be sufficiently general to accept into a reuse library.

Facts and Fallacies of Software Engineering #18 Credit to Jeff Atwood's Coding Horror post about the Rule of Three for bringing it to my attention.

About This Post

This post is just going to be a brief overview of common techniques and patterns to avoid writing the same thing over and over again. Starting with functions and moving into objects, inheritance, mixins, composition, decorators and context managers. There's plenty of other techniques, patterns and idioms that I don't touch on either but this post isn't meant to be an exhaustive list either.

Functions

Functions are a great way to ensure that a piece of code is always executed the same way. This could be as simple a small expression like (a + b) * x or something that performs a complicated piece of logic. Functions are the most basic form of code reuse in Python.

In [1]:
def calc(a, b, x):
    """Our business crucial algorithm"""
    return (a + b) * x

calc(1,2,3)
Out[1]:
9

Python also offers a limited form of anonymous functions called lambda. They're limited to just a single expression with no statements in them. A lot of them time, they serve as basic callbacks or as key functions for a sort or group method. The syntax is simple and the return value is the outcome of the expression.

In [2]:
sorted([(1,2), (3,-1), (0,0)], key=lambda x: x[1])
Out[2]:
[(3, -1), (0, 0), (1, 2)]

While lambdas are incredibly useful in many instances, it's generally considered bad form to assign them to variables (since they're supposed to be anonymous functions), not that I've never done that when it suited my needs. ;)

Objects

Objects are really the poster child for code reuse. Essentially, an object is a collection of data and functions that inter relate. Many in the Python community are fond of calling them a pile of dictionaries -- because that's what they essentially are in Python.

Objects offer all sorts of possibilities such as inheritance and composition, which I'll briefly touch upon here. For now, a simple example will suffice: take our business critical algorithm and turn it into a spreadsheet row

In [3]:
class SpreadsheetRow:
    
    def __init__(self, a, b, x):
        self.a = a
        self.b = b
        self.x = x
    
    def calc(self):
        return calc(self.a, self.b, self.x)
    
row = SpreadsheetRow(1,2,3)
print(row.calc())
9

Notice how we're already reusing code to find our business critical total of 9! If later, someone in accounting realizes that we should actually be doing a * (b + x), we simply change the original calculation function.

Inheritance

Inheritance is simply a way of giving access of all the data and methods of a class to another class. It's commonly called "specialization," though Raymond Hettinger aptly describes it as "delegating work." If later, accounting wants to be able to label all of our spreadsheet rows, we could go back and modify the original class or we could design a new one that does this for us.

Accessing information in the inherited class is done through super(), I won't delve into it's details here but it is quite super.

In [4]:
class LabeledSpreadsheetRow(SpreadsheetRow):
    
    def __init__(self, label, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.label = label
        
row = LabeledSpreadsheetRow(label='1', a=1, b=2, x=3)
print("The total for {} is {}".format(row.label, row.calc()))
The total for 1 is 9

Mixins

Mixins are a type of multiple inheritance, which I won't fully delve into here because it's a complicated and touchy subject. However, Python supports it. Because of this and it's support for duck typing, we can completely forego the use of Interfaces and Traits which are common in single inheritance languages.

Mixins are a way of writing logic that is common to many objects and placing it in a single location. Mixins are also classes that aren't meant to be instantiated on their own either, since they represent a small piece of a puzzle rather than the whole picture. A common problem I use mixins for is creating a generic __repr__ method for objects.

In [5]:
class ReprMixin:
    
    def __repr__(self):
        name = self.__class__.__name__
        attrs = ', '.join(["{}={}".format(k,v) for k,v in vars(self).items()])
        return "<{} {}>".format(name, attrs)
        
class Row(LabeledSpreadsheetRow, ReprMixin):
    pass

row = Row(label='1', a=1, b=2, x=3)
repr(row)
Out[5]:
'<Row b=2, x=3, a=1, label=1>'

This showcases the power of inheritance and mixins: composing complex objects from smaller parts into what you're wanting. The actual class we're using implements no logic of it's own but we're now provided with:

  • A repr method
  • A calculation method
  • A label attribute
  • Data points to calculate

Composition

Composition is a fancy way of saying we're going to build an object using other objects, in other words: composing them from parts. It's a similar idea to inheritance, but instead the objects we're using are stored as attributes on the main object. We have spreadsheet rows, why not a spreadsheet to hold them?

In [6]:
class Spreadsheet(ReprMixin):
    
    def __init__(self, name):
        self.name = name
        self.rows = []
        
    def show_all(self):
        for row in self.rows:
            print("The total for {} is {}".format(row.label, row.calc()))
            
    def total(self):
        return sum(r.calc() for r in self.rows)
        
sheet = Spreadsheet("alec's totals")
sheet.rows.extend([Row(label=1, a=1, b=2, x=3), Row(label=2, a=3, b=5, x=8)])
sheet.show_all()
print(sheet.total())
repr(sheet)
The total for 1 is 9
The total for 2 is 64
73

Out[6]:
"<Spreadsheet name=alec's totals, rows=[<Row b=2, x=3, a=1, label=1>, <Row b=5, x=8, a=3, label=2>]>"

Here we're not only reusing the ReprMixin so we can have accurate information about our Spreadsheet object, we're also reusing the Row objects to provide that logic for free, leaving us to just implement the show_all and total methods.

Decorators

Decorators are a way factoring logic out of a class or function and into another class or function. Or to add extra logic to it. That sounds confusing, but it's really not. I've written about them elsewhere, so if you're unfamiliar with them I recommend reading that first. Here, we're going to use two decorators Python provides in the standard library called total_ordering so we can sort our Row objects and the other is the property decorator which allows us to retreat a function as if it were an attribute (via the descriptor protocol which is a fantastic code reuse ability that I won't explore here).

In [7]:
from functools import total_ordering

@total_ordering
class ComparableRow(Row):
    
    @property
    def __key(self):
        return (self.a, self.b, self.x)
    
    def __eq__(self, other):
        return self.__key == other.__key
    
    def __lt__(self, other):
        return self.__key < other.__key
    
rows = sorted([ComparableRow(label=1, a=3, b=5, x=8), ComparableRow(label=2, a=1, b=2, x=3)])
print(rows)
[<ComparableRow b=2, x=3, a=1, label=2>, <ComparableRow b=5, x=8, a=3, label=1>]

What total_ordering does is provide all the missing rich comparison operators for us. Meaning even though we only defined __lt__ and __eq__ here, we also have __le__, __gt__, __ge__, and __ne__ available to us.

Decorators are an incredibly powerful to modify your regular Python functions and objects.

Context Managers

Context managers are a way of handling operations you typically do in pairs: open a file, close a file; start a timer, end a timer; acquire a lock, release a lock; start a transactio, end a transaction. Really, anything you do in pairs should be a candidate for context managers.

Writing context managers is pretty easy, depending on which method you go about. I'll likely explore them in a future post. For now, I'm going to stick to using the generator context manager form as an example:

In [8]:
from contextlib import contextmanager

@contextmanager
def greeting(name=None):
    print("Before the greeting.")
    yield "Hello {!s}".format(name)
    print("After the greeting.")
    
with greeting("Alec") as greet:
    print(greet)
Before the greeting.
Hello Alec
After the greeting.

We won't be writing a context manager here, but rather using one to implement an "alternate constructor" for our Spreadsheet class. Alternate constructors are a way of initializing an object in a specific way. These are especially handy if you find yourself occasionally creating an object under certain conditions. Consider dict.fromkeys which lets you fill a dictionary with keys from an iterable that all have the same value:

In [9]:
print(dict.fromkeys(range(5), None))
{0: None, 1: None, 2: None, 3: None, 4: None}

In our case, we'll probably want to draw our information from a CSV file occasionally. If we do it often enough, writing the setup logic could become tedious to rewrite all over the place.

In [10]:
import csv

class CSVSpreadsheet(Spreadsheet):
    
    @classmethod
    def from_csv(cls, sheetname, filename):
        sheet = cls(sheetname)
        with open(filename) as fh:
            reader = csv.reader(fh.readlines())
            sheet.rows = [ComparableRow(*map(int, row)) for row in reader]
        
        return sheet
    
sheet = CSVSpreadsheet.from_csv('awesome', 'row.csv')
sheet.show_all()
The total for 1 is 9
The total for 2 is 64
The total for 3 is 16

Fin

Hopefully this gives you an idea for reusing code in your own projects. Maybe you'll write your own crappy spreadsheet object as well.

Wednesday, October 1, 2014

Iteration 2: Generators

This is a follow up post to the iteration post, which covered what I consider are the basics of iteration. This post covers generators, which are a special type of iter. I'm going to try to avoid, for the most part, using generators to calculate infinite streams of integers (Fibonacci, et al).

I also need to admit being a bit of a David Beazley fan boy. His tutorials, writings and talks on generators were extremely influential on my understanding of generators, their uses and composition. As a consequence, I tend to quote him a lot.

What makes a generator a generator

The biggest, most obvious difference between generators and other things that make iterators, is that generators yield values, not return values. Consider the "cannonical" implementation of the Fibonacci sequence:

In [1]:
def gen_fib():
    a, b = 0, 1
    yield a
    yield b
    while True:
        yield a+b
        a, b = b, a+b

The way a lot of Fibonacci sequences are implemented are with a list, essentially saying, "I want the first ten digits of Fibonacci"

In [2]:
def list_fib(n):
    fibs = []
    if n < 1:
        pass
    elif n == 1:
        fibs.append(0)
    elif n == 2:
        fibs.extend([0,1])
    else:
        a, b = 0, 1
        fibs.extend([a,b])
        while len(fibs) < n:
            fibs.append(a+b)
            a, b = b, a+b
    return fibs

print(list_fib(0))
print(list_fib(1))
print(list_fib(2))
print(list_fib(10))
[]
[0]
[0, 1]
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

list_fib is a mess, we have to check for what was passed in, we need to monitor the size of the list (even using collections.deque doesn't quite solve this problem). There's a while loop we might hit. But all of this is needed to make sure we correctly construct the list of Fibonacci numbers.

By contrast, the gen_fib function is simple, clean, there's only one form of flow control. Well, it looks like there's only one form of flow control, but the logic inside of it is much more complicated.

Under the hood

So what happens when we call gen_fib?

In [3]:
print(gen_fib())
<generator object gen_fib at 0x7f61c427cb88>

A generator object pops out. Odd. I don't see any object instantiation going on inside of the function. Let's talk a closer look using the inspect modules, in particular there are two functions in there of intrest: isgenerator and isgeneratorfunction. Of course, there's also the usual tools of type and hasattr.

In [4]:
from inspect import isgenerator, isgeneratorfunction

f = gen_fib()

print("isgeneratorfunction(gen_fib):", isgeneratorfunction(gen_fib))
print("isgenerator(f):", isgenerator(gen_fib()))
print("type(f):", type(f))
print("type(iter(gen_fib())):", type(iter(f)))
print("Are generators their own iterators?", f is iter(f))
print('f hasattr __iter__:', hasattr(f, '__iter__'))
print('f hasattr __next__:', hasattr(f, '__next__'))
isgeneratorfunction(gen_fib): True
isgenerator(f): True
type(f): <class 'generator'>
type(iter(gen_fib())): <class 'generator'>
Are generators their own iterators? True
f hasattr __iter__: True
f hasattr __next__: True

Under the hood, when Python sees yield inside of a function, it converts that function into a generator class using the logic inside of it. Calling gen_fib is very similar to what happens when you call MyFibClass, out pops an object. The actual implementation of generators is beyond the scope of this, however.

Actually using generators

So, now we have this object, how can we get values out of it? The obvious answer is iteration! However, if you notice gen_fib's while loop never exits, it's infinite. Attempting to consume the "whole" sequence will exhaust time (but honestly, it'll probably consume all your memory first, Fibonacci numbers get really big really quick). But just like with other iterators, next can be used to manually pull values of it.

In [5]:
f = gen_fib()
print(next(f))
print(next(f))
print(next(f))
print(next(f))
0
1
1
2

However, I mentioned that the flow control is actually more complicated than list_fib. Here's why and where the biggest difference between yield and return become known:

  • Assign the result of gen_fib() (a generator object) to f
  • Call next on f and print the value
    • This runs the code down until the first yield statement which is:
    • Assign 0 to a and 1 to b
    • yield a (which is 0)
  • Call next on f and print the value
    • yield b (which is 1)
  • Call next on f and print the value
    • create a while loop on a true conditional
    • yield a+b (1)
  • Call next on f and print the value
    • assign b to a, assign a+b to b (this happens via tuple unpacking)
    • while loop condition is still true
    • yield a+b (2)

If it's not obvious, a generator function is a function that "pauses" it's execution when it hit yield which also causes it spit out a value. Regular functions get one chance to run and return a value: they take some arguments, do something and then return a value. A generator takes some arguments, does something, yields a value and waits around to be called again. By not building a list that is potentially massive, generators can be incredibly memory efficient.

This opens up a lot of possibilites:

  • Creating infinite sequences (perhaps the most mundane application, although incredibly handy)
  • Processing a large amount of data.
  • Creating a long sequence that you might not need all the values from (say I have a function that returns a list of twenty items but I only need the first ten)
  • Creating a long sequence where you don't need all the data at once (memory concerns, etc)
  • And others!

And what's really cool, if you have a simple generator -- something like "for each of these lines, yield one if it meets this simple requirement" -- you can write a generator expression, here's a very contrived example.

In [6]:
nums = (num for num in range(1, 100) if not num%3)
print(list(nums))
[3, 6, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 39, 42, 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, 96, 99]

Gotchas

The biggest most important gotcha is that generators are forgetful. What I mean by that, is when a generator hits yield it sends a value out. Unless the internal state that created value is met again, you can't get it back from the generator. Unlike lists and other types of iterables where you can iterate over the same object multiple times, generators are one time use. You can create multiple generator objects from the same function and iterate over each (and each will maintain their own state).

As a consequence of this, searching a generator with in or by explicitly hitting __contains__ will partially or wholly consume a generator. This is because in asks the generator, "Hey, do you ever return this value?" The generator gets to work yielding values out until one matches. All those values it outputs are gone. I suppose this could potentially be helpful in some situations, but I do want this caveat to be known.

Another thing that will catch people off guard the first few times is that generators don't have a __len__ property. Essentially, a generator has no idea how many values it will yield. It can't tell you that. Generators are also non-indexable for the same reason.

Delegating

Up until now, these examples have been compatible between Python 2 and 3. However, in 3.3 a few changes were made to yield that took it from awesome to amazing. yield gained an optional keyword from. yield from delegates access from the current generator to one it's calling. The simplest way to understanding this is to know the next two code snippets output the same thing:

In [7]:
def my_chain_1(*iters):
    '''This is actually the example code in the 
    itertools module for chain
    '''
    for it in iters:
        for item in it:
            yield item

def my_chain_2(*iters):
    '''This example completely removes a loop
    '''
    for it in iters:
        yield from it

a = range(2)
b = range(5,7)

print(list(my_chain_1(a,b)))
print(list(my_chain_2(a,b)))
[0, 1, 5, 6]
[0, 1, 5, 6]

The true power of yield from is that when you call my_chain_2 you aren't simply being fed values from a inner generator. You are interacting directly with the inner generator. The impact of this is profound. However, you don't need to construct an event loop to make use of this.

Real world use

My canonical example is walking my ~/Music directory and doing something with...

In [8]:
%%bash
find ~/Music -type f -name '*.mp3' | wc -l
18868

...that number of files. To be honest, I'm not concerned about creating a list in memory of 19k file paths (typically a one time operation that I run every now and then). What I'm concerned with is processing 19k file paths in a timely fashion, let alone opening up the files, pulling information out of them and handling them. For the time being, I'm only going to operate on a very small subset of my library. I'm also going to show off how to build "generator pipelines" as Dave Beazley calls them.

I do use a third party library here, mutagenx a Python3 implementation of mutagen.

In [9]:
import os

from pprint import pprint

from mutagenx import File

valid_types=('m4a', 'flac', 'mp3', 'ogg', 'oga')

def find(basedir, valid_types=valid_types):
    '''Utilize os.walk to only select out the files we'd like to potentially
    parse and yield them one at a time.'''
    basedir = os.path.abspath(basedir)
    for current, dirs, files in os.walk(basedir):
        files = filter(lambda f: f.endswith(valid_types), sorted(files))
        files = [os.path.join(current, f) for f in files]

        if files:
            yield from files

def adaptor(track):
    '''Take in a Mutagen track object and
    parse it into a dictionary.
    '''
    return dict(
        artist=track['artist'][0],
        album=track['album'][0],
        position=int(track['tracknumber'][0].split('/')[0]),
        length=int(track.info.length),
        location=track.filename,
        name=track['title'][0],
        )

def process_directory(basedir, valid_types=valid_types):
    '''Hook up find and adaptor into a pipeline'''
    
    files = find(basedir, valid_types)
    tracks = (File(f, easy=True) for f in files)
    yield from (adaptor(t) for t in tracks)

tracks = process_directory('/home/justanr//Music/At the Drive-In/Relationship of Command')
pprint(next(tracks))
{'album': 'Relationship of Command',
 'artist': 'At the Drive‐In',
 'length': 177,
 'location': '/home/justanr/Music/At the Drive-In/Relationship of Command/01 '
             'Arcarsenal.mp3',
 'name': 'Arcarsenal',
 'position': 1}

Because of the nature of generators, only one item gets pulled down the line at a time. So, instead of processing the whole directory at once, we can process the directory one file at a time.

yield and yield from also make processing trees easy as well. In the post on iteration, I showed a code example from the Python Cookbook that used a very complex iterator to maintain the state of iterating depth first over a tree. This is the same class built with generators:

In [10]:
class Node:
    def __init__(self, value):
        self._value = value
        self._children = []

    def __repr__(self):
        return 'Node({!r})'.format(self._value)

    def add_child(self, node):
        self._children.append(node)

    def __iter__(self):
        return iter(self._children)

    def depth_first(self):
        yield self
        for c in self:
            yield from c.depth_first()
            
root = Node(0)
child1 = Node(1)
child2 = Node(2)
root.add_child(child1)
root.add_child(child2)
child1.add_child(Node(3))
child1.add_child(Node(4))
child2.add_child(Node(5))

for ch in root.depth_first():
    print(ch)
Node(0)
Node(1)
Node(3)
Node(4)
Node(2)
Node(5)

The entire Node class is 18 lines. The DepthFirstIterator alone is 30 lines. The logic is less complex.

And of course, you can do bad things, too.

In [11]:
from itertools import chain

class IterInt(int):
    
    def __iter__(self):
        yield self

def countdown(n):
    n = IterInt(n)
    if n < 0:
        # wait... why are we returning here?
        # I thought generators never returned!
        return "Countdown finished"
    else:
        yield from chain(n, countdown(n-1))


print(list(countdown(10)))
try:
    next(countdown(-1))
except StopIteration as wut:
    # no seriously...wut
    print(wut)
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
Countdown finished

Consider it incentive for me to write more about generators.

Further Reading

Generators are a really broad topic. For something that can be implemented for creating infinite sequences to being the backbone of event loops, there's quite a bit of ground to cover. There's a lot of really smart people out there writing things about them.