An Aside: Just like my post on decorators, I've decided to rewrite this post as well because it suffered from the same issue: "Look at all this code...and hey, there's explainations as well." Instead of exploring patterns, like I did in the decorator post, I'm going to focus in on one example use that explores several aspects of descriptors all at once. Of course, I'll step through it piece by piece. There's actually going to be two major sections to this post:
- Python Object Attribute Access
- Writing Our First Descriptor
Also, this post was built with Python 3.4 in mind. While it's foolish to think that everyone everywhere is using the latest and greatest Python release, it's what I've been using primarily lately. That's me.
Updated Nov. 8th, 2014: Added concrete examples of behind the scenes action of descriptors as well as a brief explaination of what __delete__
does.
Controlling Attribute Access¶
Before digging into descriptors, it's important to talk about attribute access. Because at the end of the day, that's what descriptors do for us. There's really two ways of going about this with explicitly building our own descriptor.
Getters and Setters¶
This is what you'll see in many languages: explicit getters and setters. They're methods that handle attributes for us. This is very common in Java and PHP (or at least as of the last time I seriously used PHP). Essentially, the idea is to always expect to interact with a method instead of an attribute itself. There's nothing wrong with this if it's what your language of choice supports and you need to control access.
# it's a contrived example
# but bear with me here
# pretend this is *important business logic*
def to_lower(value):
return value.lower()
class Person:
def __init__(self, name):
self.__name = None
self.set_name(name)
def get_name(self):
return self.__name
def set_name(self, name):
self.__name = to_lower(name)
monty = Person(name="John Cleese")
print(monty.get_name())
monty.set_name("Eric Idle")
print(monty.get_name())
That's fine and dandy. If that's what you language supports. Python, in my opinion, handles this better.
@property¶
The real way you'd write this in Python is by using @property
. Which is a decorator, which we are familiar with. I won't go into details about what's going on behind the scenes yet.
class Person:
def __init__(self, name):
self.__name = None
self.name = name
@property
def name(self):
return self.__name
@name.setter
def name(self, name):
self.__name = to_lower(name)
monty = Person(name="Graham Chapman")
print(monty.name)
monty.name = "Terry Gilliam"
print(monty.name)
For now, don't worry that I have two methods called name
. It'll become apparent in a little bit.
That is a much cleaner interface to the class. As far as the calling code is concerned, name
is just another attribute. This is fantastic if you designed an object and later realized that you need to control how an attribute is returned or set (or deleted, but I'm not going to delve into that aspect of descriptors at all in this post).
To many people, this is where they'd stop with controlling attribute access. And frankly, I don't really blame them. @property
seems to be magic enough behind the scenes already. Why is it disappearing into name
, where does setter
come from, how does it work? These are questions that might go unasked or, worse: unanswered.
@property
suffices in most situations, especially when you only need to control one attribute in a specific way. But imagine if we had to ensure two or three attributes were all lower case? You might be tempted to replicate the code all the way down. You don't mind a little repitition, do you?
class Person:
def __init__(self, email, firstname, lastname):
self.__f_name = None
self.__l_name = None
self.__email = None
self.f_name = firstname
self.l_name = lastname
self.email = email
@property
def f_name(self):
return self.__f_name
@f_name.setter
def f_name(self, value):
self.__f_name = to_lower(value)
@property
def l_name(self):
return self.__l_name
@l_name.setter
def l_name(self, value):
self.__l_name = to_lower(value)
@property
def email(self):
return self.__email
@email.setter
def email(self, value):
self.__email = to_lower(value)
monty = Person(firstname='Michael', lastname='Palin', email='MichaelPalin@montypython.com')
print(monty.f_name, monty.l_name, monty.email)
Like I said, it's a contrived example. But instead of ensuring things are lower cased, imagine you're attempting to keep text fields in a GUI synchronized with the state of an object or you're working with a database ORM. Things will very quickly get out of hand if you have to property a ton of stuff with the logic repeated except for the names.
Behind the Scenes¶
I'm not going to 100% faithfully recreate @property
here, frankly I don't see a point. But I do want to dissect it from what we can observe on the surface.
property
is a decorator. As a decorator it's a function that takes a function and returns a callable. This we know.- The callable created by
property
is an object - The object has at least one method on it,
setter
, that somehow controls how a variable is set
However, if we inspect property
(my preferred way is with IPython's ?
and ??
magics) we learn that there is one more method and three attributes that aren't immediately obvious to us.
deleter
is the missing method, which handles how an attribute is deleted withdel
fget
,fset
andfdel
are the attributes, which are unsurprisingly the original functions for getting, setting and deleting attributes.
For the above example, fget
and fset
are our two name
methods above. They actually get hidden away into an object decorator, which is how we have two methods with the same name without worry.
Data model¶
This about as far as we can get without understanding how Python accesses attributes on objects. I won't attempt to give a complete in depth analysis of how Python actually accesses and sets attributes on objects, but this is a simplified, high level view of what's going on (ignoring the existence of __slots__
, which actually replaces the underlying __dict__
with a set of descriptors and what's going on there I'm not 100% sure of).
- Call
__getattribute__
- Look up in the object's dictionary
- Look up in the class's dictionary
- Walk the MRO and look up in those dictionaries
- If the name resolves to a descriptor, return the value of it's
__get__
method - If all other look ups have failed and
__getattr__
is present, call that method - If all all else fails, raise an
AttributeError
That fifth point is the most pertinent to us. An object that defines a __get__
method is known as a descriptor. property
is actually an object that does this. In effect, it's __get__
method resembles this:
def __get__(self, instance, type=None):
return self.fget(instance)
Similarly, the resolution for setting an attribute looks like this:
- If present, call
__setattr__
- If the name resolves to a descriptor, call it's
__set__
method - Stuff the value into the object's dict
So, property's __set__
looks like this:
def __set__(self, instance, value):
self.fset(instance, value)
I'm glossing over raising attribute errors for attributes that don't support reading or writing, but property
does that. There's two reasons I'm positive this is how these two methods look is because I'm familar with the descriptor protocol and Raymond Hettinger wrote about it here.
Descriptor Protocol¶
There's three parts to the descriptor protocol: __get__
, __set__
and __delete__
. Like I said, I'm not delving into deleting attributes here, so we'll remain unconcerned with that. But any object that defines at least one of these methods is a descriptor.
The best way to dissect a descriptor is provide an example implementation. This example isn't actually going to do anything except emulate regular attribute access.
class Descriptor:
def __init__(self, name):
self.name = name
def __get__(self, instance, cls):
return instance.__dict__.get(self.name, None)
def __set__(self, instance, value):
instance.__dict__[self.name] = value
def __delete__(self, instance):
del instance.__dict__[self.name]
class Thing:
frob = Descriptor(name='frob')
def __init__(self, frob):
self.frob = frob
t = Thing(frob=4)
print(t.frob)
get¶
def __get__(self, instance, type):
self
is the instance of the descriptor itself, just like any other object.instance
is an instance of the class it's attached totype
is the actual object that it's attached to, I typically prefer to usecls
as the name here because it's slightly more clear to me.owner
is another common name, but slightly more confusing to me.
Above, when we request t.frob
what's actually happening behind the scenes is Python is calling Thing.frob.__get__(t, Thing)
instead of passing Thing.frob.__get__
directly to the print function. The reason the actual class is passed as well is "to give you information about what object the descriptor is part of", to quote Chris Beaumont. While I've not made use of inspecting which class the descriptor is part of, this could be valuable information.
You could also call Thing.frob.__get__(t, Thing)
explicitly if you'd like, but Python's data model will handle this for us.
set¶
def __set__(self, instance, value):
self
again no surprises here, this is the instance of the descriptorinstance
this is an instance of the class it's attached tovalue
is the value you're passing in, if you've usedproperty
before, there's no surprise here.
Again, what's happening behind the scenes when we set t.frob
to something (in this case, just in Thing.__init__
), Python passes information to Thing.frob.__set__
, the information just being the instance of Thing
and the value we're setting.
delete¶
def __delete__(self, instance):
No surprises here. And despite that I said I wasn't going to go into deleting attributes with descriptors, I've included it for completion's sake. The delete method handles what happens when we call del t.frob
.
Data vs Non-Data Descriptor¶
This is something you're going to encounter when reading about and working with descriptors: the difference between a data and non-data descriptor and how Python treats both when looking up an attribute.
Data Descriptor¶
A data descriptor is a descriptor that defines both a __get__
and __set__
method. These descriptors recieve higher priority if Python finds a descriptor and a __dict__
entry for the attribute being looked up. Already, you can see that attribute access isn't as clear cut as we thought it was.
Non-Data Descriptor¶
A non-data descriptor is a descriptor that defines only a __get__
method. These descriptors recieve a lower priority if Python finds both the descriptor and a __dict__
entry.
What does this mean?¶
By using descriptors we can create reusable properties, as Chris Beaumont calls them and I find to be an incredibly apt definition. But there's quite a few pits we can fall into. For the rest of this post, I'm going to focus on rebuilding our lower case properties as a reusable descriptor. In another post, I'm going to more fully explore some of the power these put at our finger tips.
Our First Descriptor¶
So far, we know a descriptor needs to define at least one of __get__
, __set__
, or __delete__
. Let's try our hand at building a LowerString
descriptor.
class LowerString:
def __init__(self, value=None):
self.value = value
def __get__(self, instance, cls):
return self.value
def __set__(self, instance, value):
self.value = to_lower(value)
class Person:
f_name = LowerString()
l_name = LowerString()
email = LowerString()
def __init__(self, firstname, lastname, email):
self.f_name = firstname
self.l_name = lastname
self.email = email
monty = Person(firstname="Terry", lastname="Jones", email="TerryJONES@montyPython.com")
print(monty.f_name, monty.l_name, monty.email)
While this isn't as perfectly clean as we might like, it's certainly a lot prettier than using a series of property
decorators and way nicer than defining explicit getters and setters. However, there's a big issue here. If you can't spot it, I'll point it out.
me = Person(firstname="Alec", lastname="Reiter", email="alecreiter@fake.com")
print(me.f_name, me.l_name, me.email)
print(monty.f_name, monty.l_name, monty.email)
...oh. Well that happened. And the reason for this, and this what tripped me up when I began reading about descriptors, is that each instance of person shares the same instances of LowerString
for the three properties. Descriptors enforce a shared state by virture of being instances attached to a class rather than instances. So instead of composing an instance of an object with other objects (say a Person object composed of Job
, Nationality
and Gender
instances), we compose a class out of object instances.
If we examine the __dict__
for both the class and the instance, it becomes apparent where Python finds these values at:
print(Person.__dict__)
print(monty.__dict__)
Since the descriptors aren't attached at the instance level, Python moves up to the class level where it finds the attribute we're requesting, sees it's a descriptor and then calls the __get__
method.
If you attempt to attach these descriptors at the instance level instead, you end up with this:
class Person:
def __init__(self, firstname, lastname, email):
self.f_name = LowerString(firstname)
self.l_name = LowerString(lastname)
self.email = LowerString(email)
me = Person(firstname="Alec", lastname="Reiter", email="alecreiter@fake.com")
print(me.f_name, me.l_name, me.email)
So explicitly attaching them to the instances won't work. But remember, Python passes the instance for us automatically. Let's try storing the value on the underlying object by accessing it's __dict__
attribute:
class LowerString:
def __init__(self, label):
self.label = label
def __get__(self, instance, cls):
return instance.__dict__[self.label]
def __set__(self, instance, value):
instance.__dict__[self.label] = to_lower(value)
class Person:
f_name = LowerString('f_name')
l_name = LowerString('l_name')
email = LowerString('email')
def __init__(self, firstname, lastname, email):
self.f_name = firstname
self.l_name = lastname
self.email = email
monty = Person(firstname="Carol", lastname="Cleaveland", email="seventh@montypython.com")
print(monty.f_name, monty.l_name, monty.email)
And surely this works, but we've run into the issue of repeating ourselves again. It'd be nice if we could simply do something to automatically fill in the label for us. David Beazley addressed this problem in the 3rd Edition of the Python Cookbook.
class checkedmeta(type):
def __new__(cls, clsname, bases, methods):
# Attach attribute names to the descriptors
for key, value in methods.items():
if isinstance(value, Descriptor):
value.name = key
return type.__new__(cls, clsname, bases, methods)
Of course this means, we need to make two small changes to our descriptor: changing label
to name
and inheriting from a base Descriptor
class.
class Descriptor:
def __init__(self, name=None):
self.name = name
class LowerString(Descriptor):
def __get__(self, instance, cls=None):
return instance.__dict__[self.name]
def __set__(self, instance, value):
instance.__dict__[self.name] = to_lower(value)
class Person(metaclass=checkedmeta):
f_name = LowerString()
l_name = LowerString()
email = LowerString()
def __init__(self, firstname, lastname, email):
self.f_name = firstname
self.l_name = lastname
self.email = email
monty = Person(firstname="Carol", lastname="Cleaveland", email="seventh@montypython.com")
print(monty.f_name, monty.l_name, monty.email)
And this is very nice and handy. If later, we wanted to create an EmailValidator
descriptor, so long as we adhere to the pattern laid out here, we can attach them to any class that uses the checkedmeta
metaclass and it'll behave as expected.
But there's something still very annoying going on and it's one of the biggest gripes with property
is that a getter
has to be defined even if I'm only interested in the setter. If you set fget
to None, you end up getting an attribute error that says it's write only. If we examine our current implementation, we'll notice something else as well:
print(Person.__dict__)
print(monty.__dict__)
There's now the descriptors living at the class level and the values living at the instance level. Let's add some "debugging" print calls to see what's happening on the inside.
class LowerString(Descriptor):
def __init__(self, name=None):
self.name = name
def __get__(self, instance, cls=None):
print("Calling LowerString.__get__")
return instance.__dict__[self.name]
def __set__(self, instance, value):
print("Calling LowerString.__set__")
instance.__dict__[self.name] = to_lower(value)
class Person(metaclass=checkedmeta):
f_name = LowerString()
l_name = LowerString()
email = LowerString()
def __init__(self, firstname, lastname, email):
self.f_name = firstname
self.l_name = lastname
self.email = email
monty = Person(firstname="Carol", lastname="Cleaveland", email="seventh@montypython.com")
print(monty.f_name, monty.l_name, monty.email)
Python gives special preference to data descriptors (as described before). However, we can remove this special preference by simply removing the __get__
method. Arguably, this is the most useless part of this descriptor anyways, it's not transforming the result or providing a lazy calculation, it's simply aping what Person.__getattribute__
would do in the first place: Find the value in the object's dictionary. If we remove this, then we're left with only a setter, which is what we really wanted in the first place:
class LowerString(Descriptor):
def __init__(self, name=None):
self.name = name
def __set__(self, instance, value):
print("Calling LowerString.__set__")
instance.__dict__[self.name] = to_lower(value)
class Person(metaclass=checkedmeta):
f_name = LowerString()
l_name = LowerString()
email = LowerString()
def __init__(self, firstname, lastname, email):
self.f_name = firstname
self.l_name = lastname
self.email = email
monty = Person(firstname="Carol", lastname="Cleaveland", email="seventh@montypython.com")
print(monty.f_name, monty.l_name, monty.email)
monty.f_name = "Cheryl"
print(monty.f_name)
And this has to do with how Python sets attributes, which examined above. Again, it's giving special precedence to the descriptor, which is what we want in the first place. However, when we access the attribute, it sees there's an entry in the object's __dict__
and the class's __dict__
but the latter doesn't have a __get__
method, which causes it to default back to the object's entry.
print(Person.__dict__)
print(monty.__dict__)
Leaving us with just the setter and no unnecessary data or method duplication.
Going forward¶
In the original post, I also explored building an oberserver pattern with descriptors, something Chris Beaumont also touches upon briefly but leaves a lot on the table as far registering callbacks on every instance and specific instances of classes. I plan on touching on this again in a future post.
But for now, I'm hoping this leaves a much better impression of descriptors than my original post. Again, this isn't meant to be a tell all about descriptors but hopefully serves to clarify a lot of the magic that appears to happen behind the scenes when you're using SQLAlchemy and defining models.
Further Learning¶
- Chris Beaumont's Python Descriptors Demystified
- David Beazley's Python 3 Metaprograming this presentation explores far more than just descriptors and delves into a lot of advanced concepts
- Luciano Ramalho's Encapsulation with Descriptors
- Python.org Descriptor How To by Raymond Hettinger
- And whole host of SO Questions, just to show a few:
No comments:
Post a Comment