Recap
In the
last post about descriptors I introduced the concept of building an observer pattern with descriptors, something Chris Beaumont almost teases with in his
Python Descriptors Demystified. But, I feel he left a lot on the table with that concept.
Before delving deep into the code (and this post is going to be very code heavy), let's recap what we learned last time:
- Learned about how Python handles attribute access on objects.
- What the descriptor protocol is and how to briefly implement it
- Stored the data on the object's
__dict__
- Used a metaclass to handle registering the descriptors for us.
And now for a little bit of code dump to get it active in this notebook as well as reminding us what it looks like:
Callbacks
Callbacks are simply actions that run in response to something. They allow external code to react and hook into your code. This style of programming is very common, for example, in Node.js. These can be utilized in Python as well. For now, I'm going to stick with my business crucial to_lower
as our callback to give an example before moving on to actually working with the observer pattern.
Now, the callback could have done anything like updating a database, sending a tweet or simply plug it into a grander processing framework. Node.js uses callbacks for things like error handling on view functions. This is just to give an idea of what's happening in a basic sense. Your code runs and then sends a request to the callback for more action. Implementing call back descriptors is pretty easy.
Set frob on <__main__.Thing object at 0x7f0d2c2c6358> to 4
Of course this is an incredibly limited callback descriptor, we're limited to only one callback that's set at class definition time. But it's merely to serve as an example of what's to come.
Observers
According to wikipedia,
The Observer Pattern is a software design pattern in which an object, called the subject, maintains a list of its dependents, called observers, and notifies them automatically of any state changes, usually by calling one of their methods. It is mainly used to implement distributed event handling systems.
And according to the Gang of Four:
Define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.
The Gang of Four moves on to state that observers and subjects
shouldn't be tightly coupled because it reduces the ability to reuse them else where. Put plainly, your subject shouldn't have hard coded logic to call to specific observers. Rather, you should be able to register instances of observers onto an object (or class) and have it call out to them programmatically.
You might run into other names such as Event Handler, PubSub/Publisher-Subscriber, or Signals. These are all variations (to my best understanding) on the pattern with minute but important differences. I won't delve into them, but the take away is that all four of these follow the same basic pattern: An object hooks callbacks which run when they're notified of something.
An easy implementation of this would look like this:
An initial attempt at this pattern will use inheritance (or interfaces if you're using something like PHP or Java where single inheritance is the only option). The pattern is simple:
- We store observers in a private (or at least as private as Python allows) list
- When we need to notify the observers, we do so explicitly by hitting all of them and their
update
method
Observers are free to implement
update
in whatever way, but they must implement it. A simple implementation might look like this.
This is generally how it's implemented -- at least in most of the articles I read. It's also possible to automate the notification via property
. Say, we wanted to notify the observers every time we change the name
attribute on a Person
instance. We could write that logic every where. Maybe apply it with a context manager or decorator. But, tying it to the object makes the most amount of sense.
If we're concerned about automatically notifying the observers any time an attrbute is changed, we could just override __setattr__
to handle this for us. Which circumvents the needs to write properties for every attribute if this is the only action we're concerned with. It's super easy to implement as well.
And that's all well and good. Not to mention a good deal less complicated that what I'm about to delve into. But it's also less fun for me. I'm not going to advocate for one of these implementations over the other except to say the one I'm going to focus on will offer a much finer grain of control.
Watching specific attributes
However, if we're concerned with monitoring specific attributes for changes, descriptors are the correct way to handle this. Why bother emitting an event every time age
is changed if we only care about name
or email
?
The first step is to identify the logic we'd end up repeating in each property and moving that into a seperate object. We'll call this new class WatchedAttribute
.
Now, we can add multiple attributes that are watched without rewriting the property each time to change the variable name. If we split the name attribute into first and last names, if we add an email attribute it's easy. Just add another
WatchedAttribute
entry on the class level and set it in
__init__
.
But I feel we can improve on this pattern as well. There's two big things I'm not a fan of with this implementation:
- We manipulate the underlying dictionary to store the values.
- The Subject is responsible for notifying the Observers.
We can fix both of these things, but the first will take us down a side road.
Alternative Data Store
The first issue is trickier. We need to relate instances to values without creating a mess we'll have to clean up later, or creating a memory leak that will absolutely murder a long running process. The most effective way of handling both of these is using weak references.
References
CPython (the implementation I'm using) utilizes reference counting to determine if an object should be garbage collected. When an object's reference count drops to 0, it's space in memory can be reclaimed by Python for use else where. Sometimes we only want to hold a reference to an object but not so tightly it won't be garbage collected if we forget about it. Consider this:
<__main__.Person object at 0x7f0d2c2c64e0>
Storing instances in a dictionary as keys or values (or in a list or set) as a form of caching is extremely common. But if we remove all the other instances of the object laying around...
...that reference is left hanging around:
<__main__.Person object at 0x7f0d2c2c64e0>
Before this gets too side tracked into weak references, I want to note that they're not a silver bullet and require a little more knowledge about Python to use efficienctly. You can still shoot your foot off with them. In this case, we're not using them prevent cycles but to instead maintain a cache.
Peter Parente wrote about weak references on his blog and while some of the information is out dated (the
new
module was deprecated in 2.6 and replaced with
types
), it's still relevant to understanding what weak references are. And
Doug Hellman explored the
weakref
module in his Python Module of the Week series.
But the short of it is that an instance of
WeakKeyDictionary
,
WeakValueDictionary
or
WeakSet
will prevent this. Most things
can be weak referenced --
the documentation goes into detail about what can be: "class instances, functions written in Python (but not in C), instance methods, sets, frozensets, some file objects, generators, type objects, sockets, arrays, deques, regular expression pattern objects, and code objects."
When you're attempting to use
WeakKeyDictionary
or
WeakSet
, the object must meet one more requirement: hashable. So objects like
list
or
dict
, even if they were implemented in Python, can't take advantage of these structures. However, outside of a few corner cases, this restraint won't affect us here.
Implementing it is incredibly easy.
You'll notice the metaclass we were using before is also gone now. That's because since we're storing the information on a cache inside the descriptor, it no longer needs to worry about what name it's being held under.
Descriptors as Subjects
The next issue was moving the publishing of events out of the main object. The main reason for this would be to notify only certain subscribers when an attribute changes but not all of them. This explores what happens when access a descriptor through the class and not an instance. Meaning answering, "What does cls (or type) do on __get__
?
Accessing the Descriptor
Since descriptors are objects that just happen to follow a certain protocol that doesn't mean they can't have other methods on it. Or even follow multiple protocols. An object could be both a descriptor and an iterator for example. However, getting to these other methods can be tricky. We obviously can't do it through an instance, Python resolves that access to the __get__
method and returns a value.
This means we have to go through the class. But the way our descriptor is set up, it'll blow up when an instance isn't passed to it. We could simply return the instance of the descriptor when an instance isn't passed...would it work? Spoilers: It does. So we can fully move the registration of observers and notification out into the descriptors and our SubjectMixin can be redefined to work with our descriptor.
Actually, we end up redefining the Descriptor and WatchedAttribute classes as well. Forewarning, this is a bit of a code dump.
There's some subtle changes going on here that you might miss unless you explicitly diff the preceeding implementation of SubjectMixin with this one.
The observer container is changed from a list to a WeakSet
. Both are iterable, which means notify doesn't change (at least due to this). WeakSet
plays off both the strengths of sets (which only contain unique items) and weak references. The only thing that WeakSet
won't handle is keeping the observers in any particular sort of order and dealing with unhashable types -- neither of which affect us. You'll notice that adding elements to a set is slightly different from a list, so it's not a complete drop in replacement.
I will note I went back and forth between using WeakSet
and a regular set
. Mostly because if we remove all other references from an observer, do we intend to still have the observer still process requests? My thoughts on the matter is no, the observer should be considered dead. In other cases, the goal could be to have "anonymous" observers -- objects that are created and immediately injected into the framework rather than assigned to a name and passed in. If this is the desire, than WeakSet
wouldn't keep the object from being immediately garbage collected. I'll leave the pros and cons of both approaches as an exercise to the reader. ;)
The next subtle difference is that SubjectMixin.notify
now accepts an instance explicitly. Since we've displaced this logic to the descriptor, passing self
to it ends up passing the instance of the descriptor rather than an instance of the class it's attached to.
Other than that, it's just a matter of knowing how multiple inheritance works. Which is a completely separate matter best left to another time. It involves liberal use of super
to say the least.
The short of it is that WatchedAttribute
combines the methods and data from both Descriptor and SubjectMixin together. Meaning Descriptor
can worry about being a descriptor that stores information in a weak ref dictionary. And SubjectMixin
can worry about being the basis for observed subjects -- it's applicable for both descriptors and other objects. WatchedAttribute
just overrides how Descriptor.__set__
operates (or rather extends it if you want to split hairs) to combine the two fully.
Going Further
We could, of course, go further to registering observers for every instance of an object with the WatchedAttribute
and specific instances as well. Implementing this is a just a mite trickier, but not terribly. The first step is to imitate the behavior of collection.defaultdict
in WeakKeyDictionary
. Emulating defaultdict
is pretty straight foward and just depends on defining __missing__
, setting a hook for it in __getitem__
and providing a constructor.
The reason for building this is to utilize WeakSet
as a way to hold onto observers for us that are local to a particular instance.
With that built, we can reconstruct WatchedAttribute
to hold both "global" and "local" observers.
The real question, now, is how does it handle? It should handle the same as previous iterations on WatchedAttribute
except for the specific behavior we've overriden here. I'm also going to add some convience methods to the Person
class to make it slightly easier to interact with the observers.
As we can see, the observer that prints the value of Person.name
in upper case is bound only to the first instance of Person, where as the one that strips out the vowels and prints that result is bound to all of instances. It's also possible to create an ignore
method that would allow specific instances to ignore certain observers as well. Or even better, create a set of rules that can be followed: "Only invoke this observer if the value doesn't change."
Something I've curiously ignored is pre-subscribing observers. That is to say, when we create the class we attach a predetermined list of observers to the attribute. This is a feature of the original SubjectMixin
class and is inherited to WatchedAttribute
(or as Raymond Hettinger would put it: WatchedAttribute
delegates the work to SubjectMixin
).
Fin
This method of implementing the observer pattern allows a lot of very fine grained control. I'm not advocating it as a good solution -- or even a workable solution on its own. There's plenty that's left on the table as far as details and issues go. For example, how would this expand to using a messaging queue (ZeroMQ or Redis) to publish events to? Or how does it interact with asyncio or twisted? Integrating this pattern with an existing framework (blinker for example) would probably be the best solution
Rather, it's meant as an introduction to the true power of what you can do with descriptors beyond just making sure a string is all lower case or normalizing floating point numbers to
decimal.Decimal
instances. Which are valid uses of descriptors, don't take that the wrong way.
Some of the concepts introduced here -- manipulating descriptors on both the instance and class levels -- are used to build tremendously flexible systems. Ever wonder how SQLAlchemy seems to magically treat class attributes as parameters in search queries but then magically they're filled with data on the instance level? Descriptors and that
if instance is None
check.
Further Reading