A collection of computer, gaming and general nerdy things.

Tuesday, September 30, 2014

Iteration 1: The basics

In [1]:
for x in [1,2,3]:
    print(x, end=' ')
1 2 3 

Iteration is something that is used all the time in programming. Python makes it really easy to use.

Parts of Iteration

There are a couple of parts to iteration, consider this a simple glossary for terms used in this post:

  • The Consumer: something that uses iteration to get values from an object: for ...for example
  • The Iteration: act of iterating over an object implementing the iter protocol
  • The Iter Protocol: interface that iteration uses, requires both __iter__ and __next__ to be defined
  • The Iterable: container object that implements __iter__, which in turns returns an iterator
  • The Iterator: object that implements __next__, which in turns returns value to the consumer

Consumers can be anything: for, iter creates an iterable out of a container. The builtin next handles manual advancement over an iterator. Whatever they are, they use iteration to get values from an object. This is the simplest part of the question of iteration.

Iter Protocol, Iterables and Iterators

This part is more complex because there's multiple ways to approach implementing iteration. The iter protocol says that two methods must be implemented:

  • __iter__: On the container this must return an object that implements __next__; on the iterator, it must return it's instance
  • __next__: This only has to be implemented on the iterator, this method is used to return values to the consumer. When no values are left to return, this method must raise StopIteration and continue to do so on each sequential call.

A common way of handling iteration is two split the protocol over two objects. list does this.

In [2]:
print('[] and iter([]) are different:', type([]) != type(iter([])))
print('iter([]) and iter(iter([])) are the same:', type(iter([])) == type(iter(iter([]))))
print('typeof iter([]):', type(iter([])))
print('[] hasattr __iter__:', hasattr([], '__iter__'))
print('[] hasattr __next__:', hasattr([], '__next__'))
print('iter([]) hasattr __iter__:', hasattr(iter([]), '__iter__'))
print('iter([]) hasattr __next__:', hasattr(iter([]), '__next__'))
[] and iter([]) are different: True
iter([]) and iter(iter([])) are the same: True
typeof iter([]): <class 'list_iterator'>
[] hasattr __iter__: True
[] hasattr __next__: False
iter([]) hasattr __iter__: True
iter([]) hasattr __next__: True

It's very common for an iterable and an iterator to be different; however, it's not uncommon for them to be the same object (as defined, every iterator is both an iterable and it's iterator).

Defining our own iterators

Python makes it super easy to define our own iteratables and iterators. Just implement a couple of methods and you're done!

In [3]:
class MyIterable:
    '''Simple implementation of the iter protocol
    __iter__ returns an instance of MyIterator
    '''
    def __init__(self, upper=0):
        self.upper = upper
    
    def __iter__(self):
        return MyIterator(upper=self.upper)

class MyIterator:
    '''Simple implementaion of the iter protocol
    When iterated, incremental numbers are returned
    until the upper bound is reached.
    '''
    
    def __init__(self, upper=0):
        self.upper = upper
        self.__current = 0
    
    def __iter__(self):
        return self

    def __next__(self):
        if self.__current >= self.upper:
            raise StopIteration("Upper bound reached.")
        else:
            self.__current += 1
            return self.__current - 1

for x in MyIterable(4):
    print(x)
0
1
2
3

The advantage to splitting the iterable and the iterator into two classes is that you can maintain multiple iterators all at different states. If an object handled it's own iteration, you can maintain only one state.

In [4]:
test = MyIterator(4)

for x in test:
    print(x, end=' ')

for x in test:
    print(x, end=' ')

test = MyIterable(4)
it = iter(test)
print('\n', next(it), sep='')

for x in test:
    print(x, end=' ')

print('\n', next(it), sep='')
print(next(it))
print(next(it))
0 1 2 3 
0
0 1 2 3 
1
2
3

However, something that's a little sneaky is returning a something else completely from the __iter__ so long as it is an iterator.

In [5]:
class SneakyIter(MyIterable):
    
    def __iter__(self):
        return iter(range(self.upper))

for x in SneakyIter(4):
    print(x, end=' ')
0 1 2 3 

Returning just range wouldn't work because range is iterable but it's not an iterator.

Iters huh what are they good for?

Well, a lot more than just counting. To borrow an example from Dave Beazly's fantastic Python Cookbook, 3rd Edition (I seriously cannot recommend this book enough), traversing nodes.

In [6]:
class Node:
    def __init__(self, value):
        self._value = value
        self._children = []

    def __repr__(self):
        return 'Node(%r)' % self._value

    def add_child(self, other_node):
        self._children.append(other_node)
 
    def __iter__(self):
        return iter(self._children)

    def depth_first(self):
        return DepthFirstIterator(self)

class DepthFirstIterator(object):
    '''
    Depth-first traversal
    '''
    def __init__(self, start_node):
        self._node = start_node
        self._children_iter = None
        self._child_iter = None

    def __iter__(self):
        return self

    def __next__(self):
        # Return myself if just started. Create an iterator for children
        if self._children_iter is None:
            self._children_iter = iter(self._node)
            return self._node

        # If processing a child, return its next item
        elif self._child_iter:
            try:
                nextchild = next(self._child_iter)
                return nextchild
            except StopIteration:
                self._child_iter = None
                return next(self)

        # Advance to the next child and start its iteration
        else:
            self._child_iter = next(self._children_iter).depth_first()
            return next(self)
        
root = Node(0)
child1 = Node(1)
child2 = Node(2)
root.add_child(child1)
root.add_child(child2)
child1.add_child(Node(3))
child1.add_child(Node(4))
child2.add_child(Node(5))

for ch in root.depth_first():
    print(ch, end=' ')

print('\n')
for ch in root:
    print(ch, end=' ')
Node(0) Node(1) Node(3) Node(4) Node(2) Node(5) 

Node(1) Node(2) 

Iteration shortcuts, tips and tricks

There are shortcuts to iteration, the most commonly known are comphrensions. Comphrensions are types of literals that do more than simply create, say, a list.

In [7]:
from string import ascii_lowercase as lowercase

# list comp.
test = [ord(c) for c in lowercase]
print(test)

# dict comp.
test = {c:ord(c) for c in lowercase}
print(test)

#you can use comps. in place of iterables in function arguments
print(sum([ord(c) for c in lowercase]))

#something else you can do is drop the brackets if the comp is the only argument
print(sum(ord(c) for c in lowercase))
[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122]
{'x': 120, 'i': 105, 'n': 110, 'c': 99, 'b': 98, 'h': 104, 's': 115, 'j': 106, 'p': 112, 'f': 102, 'e': 101, 'a': 97, 'r': 114, 'd': 100, 'u': 117, 'g': 103, 'l': 108, 'k': 107, 'w': 119, 'o': 111, 'y': 121, 'm': 109, 'q': 113, 't': 116, 'v': 118, 'z': 122}
2847
2847

But there are more of manipulating iters, such as the builtin filter and map, the itertools module (which all return a special type of iterable called a generator, but that's another post, just know they aren't lists). But the use of filter and map are lessened some by list comps, which I often find easier to write. Some would argue that comphrensions should be used for creating new objects, whereas map, filter, et. al. should be used for manipulating existing objects. However, the important thing is consistency: if you use filter and map all over the place, don't suddenly throw a comphrension that transforms an existing structure.

In [8]:
filt = filter(lambda c: not ord(c) % 3, lowercase)
filt = list(filt) # transform filt into a list
comp = [c for c in lowercase if not ord(c)%3]
print(filt, comp, sep='\n')

mapd = map(ord, lowercase)
mapd = list(mapd)
comp = [ord(c) for c in lowercase]
print(mapd, comp, sep='\n')

mapfilt = map(ord, filter(lambda c: not ord(c) % 3, lowercase))
mapfilt = list(mapfilt)
comphre = [ord(c) for c in lowercase if not ord(c)%3]
print(mapfilt, comphre, sep='\n')
['c', 'f', 'i', 'l', 'o', 'r', 'u', 'x']
['c', 'f', 'i', 'l', 'o', 'r', 'u', 'x']
[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122]
[97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122]
[99, 102, 105, 108, 111, 114, 117, 120]
[99, 102, 105, 108, 111, 114, 117, 120]

There's also unpacking, which is a great tool for pulling items out of an iterable without using a regular form of iteration. The way it works is that Python transparently creates a tuple -- if you didn't know the tuple operator is the ,, not (); parens are only needed if the tuple is not the only argument in a function -- and then unpacking the tuple into the curret namespace. Seems complicated, but it really isn't.

In [9]:
a, b, c = lowercase[:3]
print(a, b, c)

# tuple unpacking is also useful for variable switching
a, b, c = c, b, a
print(a, b, c)

# similar to how we can splat a list into a function, 
# we can splat iterables into unpacking
# _ becomes a list
a, *_, z = lowercase
print(a, z, _)
a b c
c b a
a z ['b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y']

What's next?

Well, there are other types of iterators called generators which iterate in a peculiar way. And without having a base understanding of them, understanding the real benefits of iters and itertools is difficult.

But other than that, go and iterate all the things. But just remember to iterate responsibly.

Resources

Iterable Integers, or: Why?

In [1]:
from itertools import chain

class IterInt(int):
    
    def __iter__(self):
        yield self

def countdown(n):
    n = IterInt(n)
    if n < 0:
        return "Countdown finished"
    else:
        yield from chain(n, countdown(n-1))

list(countdown(10))
Out[1]:
[10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0]

Today I learned that setting and overriding methods and attributes on built in types is a TypeError. Maybe it's to make iterating over an integer feel wrong. But if this is wrong, I don't want to be right.

Monday, September 29, 2014

Describing Descriptors

An Aside: Just like my post on decorators, I've decided to rewrite this post as well because it suffered from the same issue: "Look at all this code...and hey, there's explainations as well." Instead of exploring patterns, like I did in the decorator post, I'm going to focus in on one example use that explores several aspects of descriptors all at once. Of course, I'll step through it piece by piece. There's actually going to be two major sections to this post:

  • Python Object Attribute Access
  • Writing Our First Descriptor

Also, this post was built with Python 3.4 in mind. While it's foolish to think that everyone everywhere is using the latest and greatest Python release, it's what I've been using primarily lately. That's me.

Updated Nov. 8th, 2014: Added concrete examples of behind the scenes action of descriptors as well as a brief explaination of what __delete__ does.

Controlling Attribute Access

Before digging into descriptors, it's important to talk about attribute access. Because at the end of the day, that's what descriptors do for us. There's really two ways of going about this with explicitly building our own descriptor.

Getters and Setters

This is what you'll see in many languages: explicit getters and setters. They're methods that handle attributes for us. This is very common in Java and PHP (or at least as of the last time I seriously used PHP). Essentially, the idea is to always expect to interact with a method instead of an attribute itself. There's nothing wrong with this if it's what your language of choice supports and you need to control access.

In [1]:
# it's a contrived example
# but bear with me here
# pretend this is *important business logic*
def to_lower(value):
    return value.lower()

class Person:
    def __init__(self, name):
        self.__name = None
        self.set_name(name)
    
    def get_name(self):
        return self.__name
    
    def set_name(self, name):
        self.__name = to_lower(name)

monty = Person(name="John Cleese")
print(monty.get_name())
monty.set_name("Eric Idle")
print(monty.get_name())
john cleese
eric idle

That's fine and dandy. If that's what you language supports. Python, in my opinion, handles this better.

@property

The real way you'd write this in Python is by using @property. Which is a decorator, which we are familiar with. I won't go into details about what's going on behind the scenes yet.

In [2]:
class Person:
    
    def __init__(self, name):
        self.__name = None
        self.name = name
    
    @property
    def name(self):
        return self.__name
    
    @name.setter
    def name(self, name):
        self.__name = to_lower(name)

monty = Person(name="Graham Chapman")
print(monty.name)
monty.name = "Terry Gilliam"
print(monty.name)
graham chapman
terry gilliam

For now, don't worry that I have two methods called name. It'll become apparent in a little bit.

That is a much cleaner interface to the class. As far as the calling code is concerned, name is just another attribute. This is fantastic if you designed an object and later realized that you need to control how an attribute is returned or set (or deleted, but I'm not going to delve into that aspect of descriptors at all in this post).

To many people, this is where they'd stop with controlling attribute access. And frankly, I don't really blame them. @property seems to be magic enough behind the scenes already. Why is it disappearing into name, where does setter come from, how does it work? These are questions that might go unasked or, worse: unanswered.

@property suffices in most situations, especially when you only need to control one attribute in a specific way. But imagine if we had to ensure two or three attributes were all lower case? You might be tempted to replicate the code all the way down. You don't mind a little repitition, do you?

In [3]:
class Person:
    
    def __init__(self, email, firstname, lastname):
        self.__f_name = None
        self.__l_name = None
        self.__email = None
        self.f_name = firstname
        self.l_name = lastname
        self.email = email
    
    @property
    def f_name(self):
        return self.__f_name
    
    @f_name.setter
    def f_name(self, value):
        self.__f_name = to_lower(value)

    @property
    def l_name(self):
        return self.__l_name
    
    @l_name.setter
    def l_name(self, value):
        self.__l_name = to_lower(value)

    @property
    def email(self):
        return self.__email
    
    @email.setter
    def email(self, value):
        self.__email = to_lower(value)

monty = Person(firstname='Michael', lastname='Palin', email='MichaelPalin@montypython.com')
print(monty.f_name, monty.l_name, monty.email)
michael palin michaelpalin@montypython.com

Like I said, it's a contrived example. But instead of ensuring things are lower cased, imagine you're attempting to keep text fields in a GUI synchronized with the state of an object or you're working with a database ORM. Things will very quickly get out of hand if you have to property a ton of stuff with the logic repeated except for the names.

Behind the Scenes

I'm not going to 100% faithfully recreate @property here, frankly I don't see a point. But I do want to dissect it from what we can observe on the surface.

  • property is a decorator. As a decorator it's a function that takes a function and returns a callable. This we know.
  • The callable created by property is an object
  • The object has at least one method on it, setter, that somehow controls how a variable is set

However, if we inspect property (my preferred way is with IPython's ? and ?? magics) we learn that there is one more method and three attributes that aren't immediately obvious to us.

  • deleter is the missing method, which handles how an attribute is deleted with del
  • fget, fset and fdel are the attributes, which are unsurprisingly the original functions for getting, setting and deleting attributes.

For the above example, fget and fset are our two name methods above. They actually get hidden away into an object decorator, which is how we have two methods with the same name without worry.

Data model

This about as far as we can get without understanding how Python accesses attributes on objects. I won't attempt to give a complete in depth analysis of how Python actually accesses and sets attributes on objects, but this is a simplified, high level view of what's going on (ignoring the existence of __slots__, which actually replaces the underlying __dict__ with a set of descriptors and what's going on there I'm not 100% sure of).

  1. Call __getattribute__
  2. Look up in the object's dictionary
  3. Look up in the class's dictionary
  4. Walk the MRO and look up in those dictionaries
  5. If the name resolves to a descriptor, return the value of it's __get__ method
  6. If all other look ups have failed and __getattr__ is present, call that method
  7. If all all else fails, raise an AttributeError

That fifth point is the most pertinent to us. An object that defines a __get__ method is known as a descriptor. property is actually an object that does this. In effect, it's __get__ method resembles this:

In [4]:
def __get__(self, instance, type=None):
    return self.fget(instance)

Similarly, the resolution for setting an attribute looks like this:

  1. If present, call __setattr__
  2. If the name resolves to a descriptor, call it's __set__ method
  3. Stuff the value into the object's dict

So, property's __set__ looks like this:

In [5]:
def __set__(self, instance, value):
    self.fset(instance, value)

I'm glossing over raising attribute errors for attributes that don't support reading or writing, but property does that. There's two reasons I'm positive this is how these two methods look is because I'm familar with the descriptor protocol and Raymond Hettinger wrote about it here.

Descriptor Protocol

There's three parts to the descriptor protocol: __get__, __set__ and __delete__. Like I said, I'm not delving into deleting attributes here, so we'll remain unconcerned with that. But any object that defines at least one of these methods is a descriptor.

The best way to dissect a descriptor is provide an example implementation. This example isn't actually going to do anything except emulate regular attribute access.

In [6]:
class Descriptor:
    
    def __init__(self, name):
        self.name = name
    
    def __get__(self, instance, cls):
        return instance.__dict__.get(self.name, None)
    
    def __set__(self, instance, value):
        instance.__dict__[self.name] = value
        
    def __delete__(self, instance):
        del instance.__dict__[self.name]

class Thing:
    frob = Descriptor(name='frob')
    
    def __init__(self, frob):
        self.frob = frob

t = Thing(frob=4)
print(t.frob)
4

get

def __get__(self, instance, type):

  • self is the instance of the descriptor itself, just like any other object.
  • instance is an instance of the class it's attached to
  • type is the actual object that it's attached to, I typically prefer to use cls as the name here because it's slightly more clear to me. owner is another common name, but slightly more confusing to me.

Above, when we request t.frob what's actually happening behind the scenes is Python is calling Thing.frob.__get__(t, Thing) instead of passing Thing.frob.__get__ directly to the print function. The reason the actual class is passed as well is "to give you information about what object the descriptor is part of", to quote Chris Beaumont. While I've not made use of inspecting which class the descriptor is part of, this could be valuable information.

You could also call Thing.frob.__get__(t, Thing) explicitly if you'd like, but Python's data model will handle this for us.

set

def __set__(self, instance, value):

  • self again no surprises here, this is the instance of the descriptor
  • instance this is an instance of the class it's attached to
  • value is the value you're passing in, if you've used property before, there's no surprise here.

Again, what's happening behind the scenes when we set t.frob to something (in this case, just in Thing.__init__), Python passes information to Thing.frob.__set__, the information just being the instance of Thing and the value we're setting.

delete

def __delete__(self, instance):

No surprises here. And despite that I said I wasn't going to go into deleting attributes with descriptors, I've included it for completion's sake. The delete method handles what happens when we call del t.frob.

Data vs Non-Data Descriptor

This is something you're going to encounter when reading about and working with descriptors: the difference between a data and non-data descriptor and how Python treats both when looking up an attribute.

Data Descriptor

A data descriptor is a descriptor that defines both a __get__ and __set__ method. These descriptors recieve higher priority if Python finds a descriptor and a __dict__ entry for the attribute being looked up. Already, you can see that attribute access isn't as clear cut as we thought it was.

Non-Data Descriptor

A non-data descriptor is a descriptor that defines only a __get__ method. These descriptors recieve a lower priority if Python finds both the descriptor and a __dict__ entry.

What does this mean?

By using descriptors we can create reusable properties, as Chris Beaumont calls them and I find to be an incredibly apt definition. But there's quite a few pits we can fall into. For the rest of this post, I'm going to focus on rebuilding our lower case properties as a reusable descriptor. In another post, I'm going to more fully explore some of the power these put at our finger tips.

Our First Descriptor

So far, we know a descriptor needs to define at least one of __get__, __set__, or __delete__. Let's try our hand at building a LowerString descriptor.

In [7]:
class LowerString:
    
    def __init__(self, value=None):
        self.value = value
    
    def __get__(self, instance, cls):
        return self.value
    
    def __set__(self, instance, value):
        self.value = to_lower(value)

class Person:
    f_name = LowerString()
    l_name = LowerString()
    email = LowerString()
    
    def __init__(self, firstname, lastname, email):
        self.f_name = firstname
        self.l_name = lastname
        self.email = email
        
monty = Person(firstname="Terry", lastname="Jones", email="TerryJONES@montyPython.com")
print(monty.f_name, monty.l_name, monty.email)
terry jones terryjones@montypython.com

While this isn't as perfectly clean as we might like, it's certainly a lot prettier than using a series of property decorators and way nicer than defining explicit getters and setters. However, there's a big issue here. If you can't spot it, I'll point it out.

In [8]:
me = Person(firstname="Alec", lastname="Reiter", email="alecreiter@fake.com")
print(me.f_name, me.l_name, me.email)
print(monty.f_name, monty.l_name, monty.email)
alec reiter alecreiter@fake.com
alec reiter alecreiter@fake.com

...oh. Well that happened. And the reason for this, and this what tripped me up when I began reading about descriptors, is that each instance of person shares the same instances of LowerString for the three properties. Descriptors enforce a shared state by virture of being instances attached to a class rather than instances. So instead of composing an instance of an object with other objects (say a Person object composed of Job, Nationality and Gender instances), we compose a class out of object instances.

If we examine the __dict__ for both the class and the instance, it becomes apparent where Python finds these values at:

In [9]:
print(Person.__dict__)
print(monty.__dict__)
{'__doc__': None, '__weakref__': <attribute '__weakref__' of 'Person' objects>, '__init__': <function Person.__init__ at 0x7f337e2e58c8>, 'f_name': <__main__.LowerString object at 0x7f337ce4a5c0>, 'l_name': <__main__.LowerString object at 0x7f337ce4a550>, 'email': <__main__.LowerString object at 0x7f337ce4a630>, '__module__': '__main__', '__dict__': <attribute '__dict__' of 'Person' objects>}
{}

Since the descriptors aren't attached at the instance level, Python moves up to the class level where it finds the attribute we're requesting, sees it's a descriptor and then calls the __get__ method.

If you attempt to attach these descriptors at the instance level instead, you end up with this:

In [10]:
class Person:
  
    def __init__(self, firstname, lastname, email):
        self.f_name = LowerString(firstname)
        self.l_name = LowerString(lastname)
        self.email = LowerString(email)
        
me = Person(firstname="Alec", lastname="Reiter", email="alecreiter@fake.com")
print(me.f_name, me.l_name, me.email)
<__main__.LowerString object at 0x7f337ce4e390> <__main__.LowerString object at 0x7f337ce4e400> <__main__.LowerString object at 0x7f337ce4e438>

So explicitly attaching them to the instances won't work. But remember, Python passes the instance for us automatically. Let's try storing the value on the underlying object by accessing it's __dict__ attribute:

In [11]:
class LowerString:
    
    def __init__(self, label):
        self.label = label
    
    def __get__(self, instance, cls):
        return instance.__dict__[self.label]
    
    def __set__(self, instance, value):
        instance.__dict__[self.label] = to_lower(value)
        
class Person:
    f_name = LowerString('f_name')
    l_name = LowerString('l_name')
    email = LowerString('email')
    
    def __init__(self, firstname, lastname, email):
        self.f_name = firstname
        self.l_name = lastname
        self.email = email
        
monty = Person(firstname="Carol", lastname="Cleaveland", email="seventh@montypython.com")
print(monty.f_name, monty.l_name, monty.email)
carol cleaveland seventh@montypython.com

And surely this works, but we've run into the issue of repeating ourselves again. It'd be nice if we could simply do something to automatically fill in the label for us. David Beazley addressed this problem in the 3rd Edition of the Python Cookbook.

In [12]:
class checkedmeta(type):
    def __new__(cls, clsname, bases, methods):
        # Attach attribute names to the descriptors
        for key, value in methods.items():
            if isinstance(value, Descriptor):
                value.name = key
        return type.__new__(cls, clsname, bases, methods)

Of course this means, we need to make two small changes to our descriptor: changing label to name and inheriting from a base Descriptor class.

In [13]:
class Descriptor:
    
    def __init__(self, name=None):
        self.name = name

class LowerString(Descriptor):
    
    def __get__(self, instance, cls=None):
        return instance.__dict__[self.name]
    
    def __set__(self, instance, value):
        instance.__dict__[self.name] = to_lower(value)
        
class Person(metaclass=checkedmeta):
    f_name = LowerString()
    l_name = LowerString()
    email = LowerString()
    
    def __init__(self, firstname, lastname, email):
        self.f_name = firstname
        self.l_name = lastname
        self.email = email
        
monty = Person(firstname="Carol", lastname="Cleaveland", email="seventh@montypython.com")
print(monty.f_name, monty.l_name, monty.email)
carol cleaveland seventh@montypython.com

And this is very nice and handy. If later, we wanted to create an EmailValidator descriptor, so long as we adhere to the pattern laid out here, we can attach them to any class that uses the checkedmeta metaclass and it'll behave as expected.

But there's something still very annoying going on and it's one of the biggest gripes with property is that a getter has to be defined even if I'm only interested in the setter. If you set fget to None, you end up getting an attribute error that says it's write only. If we examine our current implementation, we'll notice something else as well:

In [14]:
print(Person.__dict__)
print(monty.__dict__)
{'__doc__': None, '__weakref__': <attribute '__weakref__' of 'Person' objects>, '__init__': <function Person.__init__ at 0x7f337ce38c80>, 'f_name': <__main__.LowerString object at 0x7f337ce31588>, 'l_name': <__main__.LowerString object at 0x7f337ce31550>, 'email': <__main__.LowerString object at 0x7f337ce316a0>, '__module__': '__main__', '__dict__': <attribute '__dict__' of 'Person' objects>}
{'email': 'seventh@montypython.com', 'f_name': 'carol', 'l_name': 'cleaveland'}

There's now the descriptors living at the class level and the values living at the instance level. Let's add some "debugging" print calls to see what's happening on the inside.

In [15]:
class LowerString(Descriptor):
    
    def __init__(self, name=None):
        self.name = name
    
    def __get__(self, instance, cls=None):
        print("Calling LowerString.__get__")
        return instance.__dict__[self.name]
    
    def __set__(self, instance, value):
        print("Calling LowerString.__set__")
        instance.__dict__[self.name] = to_lower(value)
        
class Person(metaclass=checkedmeta):
    f_name = LowerString()
    l_name = LowerString()
    email = LowerString()
    
    def __init__(self, firstname, lastname, email):
        self.f_name = firstname
        self.l_name = lastname
        self.email = email
        
monty = Person(firstname="Carol", lastname="Cleaveland", email="seventh@montypython.com")
print(monty.f_name, monty.l_name, monty.email)
Calling LowerString.__set__
Calling LowerString.__set__
Calling LowerString.__set__
Calling LowerString.__get__
Calling LowerString.__get__
Calling LowerString.__get__
carol cleaveland seventh@montypython.com

Python gives special preference to data descriptors (as described before). However, we can remove this special preference by simply removing the __get__ method. Arguably, this is the most useless part of this descriptor anyways, it's not transforming the result or providing a lazy calculation, it's simply aping what Person.__getattribute__ would do in the first place: Find the value in the object's dictionary. If we remove this, then we're left with only a setter, which is what we really wanted in the first place:

In [16]:
class LowerString(Descriptor):
    
    def __init__(self, name=None):
        self.name = name
    
    def __set__(self, instance, value):
        print("Calling LowerString.__set__")
        instance.__dict__[self.name] = to_lower(value)
        
class Person(metaclass=checkedmeta):
    f_name = LowerString()
    l_name = LowerString()
    email = LowerString()
    
    def __init__(self, firstname, lastname, email):
        self.f_name = firstname
        self.l_name = lastname
        self.email = email
        
monty = Person(firstname="Carol", lastname="Cleaveland", email="seventh@montypython.com")
print(monty.f_name, monty.l_name, monty.email)
monty.f_name = "Cheryl"
print(monty.f_name)
Calling LowerString.__set__
Calling LowerString.__set__
Calling LowerString.__set__
carol cleaveland seventh@montypython.com
Calling LowerString.__set__
cheryl

And this has to do with how Python sets attributes, which examined above. Again, it's giving special precedence to the descriptor, which is what we want in the first place. However, when we access the attribute, it sees there's an entry in the object's __dict__ and the class's __dict__ but the latter doesn't have a __get__ method, which causes it to default back to the object's entry.

In [17]:
print(Person.__dict__)
print(monty.__dict__)
{'__doc__': None, '__weakref__': <attribute '__weakref__' of 'Person' objects>, '__init__': <function Person.__init__ at 0x7f337ce43730>, 'f_name': <__main__.LowerString object at 0x7f337ce55710>, 'l_name': <__main__.LowerString object at 0x7f337ce55748>, 'email': <__main__.LowerString object at 0x7f337ce55780>, '__module__': '__main__', '__dict__': <attribute '__dict__' of 'Person' objects>}
{'email': 'seventh@montypython.com', 'f_name': 'cheryl', 'l_name': 'cleaveland'}

Leaving us with just the setter and no unnecessary data or method duplication.

Going forward

In the original post, I also explored building an oberserver pattern with descriptors, something Chris Beaumont also touches upon briefly but leaves a lot on the table as far registering callbacks on every instance and specific instances of classes. I plan on touching on this again in a future post.

But for now, I'm hoping this leaves a much better impression of descriptors than my original post. Again, this isn't meant to be a tell all about descriptors but hopefully serves to clarify a lot of the magic that appears to happen behind the scenes when you're using SQLAlchemy and defining models.

Further Learning