A collection of computer, gaming and general nerdy things.

Saturday, November 8, 2014

Observer Pattern through Descriptors

Recap

In the last post about descriptors I introduced the concept of building an observer pattern with descriptors, something Chris Beaumont almost teases with in his Python Descriptors Demystified. But, I feel he left a lot on the table with that concept.
Before delving deep into the code (and this post is going to be very code heavy), let's recap what we learned last time:
  • Learned about how Python handles attribute access on objects.
  • What the descriptor protocol is and how to briefly implement it
  • Stored the data on the object's __dict__
  • Used a metaclass to handle registering the descriptors for us.
And now for a little bit of code dump to get it active in this notebook as well as reminding us what it looks like:
In [1]:
class Descriptor:
    def __init__(self, name=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.name = name

class checkedmeta(type):
    def __new__(cls, clsname, bases, methods):
        # Attach attribute names to the descriptors
        for key, value in methods.items():
            if isinstance(value, Descriptor):
                value.name = key
        # really we should use super rather than type here
        return super().__new__(cls, clsname, bases, methods)

Callbacks

Callbacks are simply actions that run in response to something. They allow external code to react and hook into your code. This style of programming is very common, for example, in Node.js. These can be utilized in Python as well. For now, I'm going to stick with my business crucial to_lower as our callback to give an example before moving on to actually working with the observer pattern.
In [2]:
# pretend this lives at in a package called critical
# and actually does something really useful
def to_lower(value):
    return value.lower()

def print_lower(value):
    print(to_lower(value))
    
#from critical import print_lower
def my_business_logic(value, callback):
    remove = 'aeiou'
    
    value = ''.join([c for c in value if not c.lower() in remove])
    callback(value)
    return value

my_business_logic('Alec Reiter', callback=print_lower)
lc rtr

Out[2]:
'lc Rtr'
Now, the callback could have done anything like updating a database, sending a tweet or simply plug it into a grander processing framework. Node.js uses callbacks for things like error handling on view functions. This is just to give an idea of what's happening in a basic sense. Your code runs and then sends a request to the callback for more action. Implementing call back descriptors is pretty easy.
In [3]:
class CallbackAttribute(Descriptor):
    
    def __init__(self, callback=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.callback = callback
    
    def __set__(self, instance, value):
        instance.__dict__[self.name] = value
        if self.callback:
            self.callback(instance, self.name, value)

def frobed_callback(instance, name, value):
    print("Set {} on {!s} to {}".format(name, instance, value))
            
class Thing(metaclass=checkedmeta):
    frob = CallbackAttribute(callback=frobed_callback)
    
    def __init__(self, frob):
        self.frob = frob

foo = Thing(frob=4)
Set frob on <__main__.Thing object at 0x7f0d2c2c6358> to 4

Of course this is an incredibly limited callback descriptor, we're limited to only one callback that's set at class definition time. But it's merely to serve as an example of what's to come.

Observers

According to wikipedia,
The Observer Pattern is a software design pattern in which an object, called the subject, maintains a list of its dependents, called observers, and notifies them automatically of any state changes, usually by calling one of their methods. It is mainly used to implement distributed event handling systems.
And according to the Gang of Four:
Define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.
The Gang of Four moves on to state that observers and subjects shouldn't be tightly coupled because it reduces the ability to reuse them else where. Put plainly, your subject shouldn't have hard coded logic to call to specific observers. Rather, you should be able to register instances of observers onto an object (or class) and have it call out to them programmatically.
You might run into other names such as Event Handler, PubSub/Publisher-Subscriber, or Signals. These are all variations (to my best understanding) on the pattern with minute but important differences. I won't delve into them, but the take away is that all four of these follow the same basic pattern: An object hooks callbacks which run when they're notified of something.
An easy implementation of this would look like this:
In [4]:
from abc import ABCMeta, abstractmethod

class SubjectMixin:
    """Mixin that will allow an object to notify observers about changes to itself."""
    
    def __init__(self, observers=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._observers = []
        if observers:
            self._observers.extend(observers)
    
    def notify(self):
        for observer in self._observers:
            observer.update(self)
    
    def add_observer(self, observer):
        if observer not in self._observers:
            self._observers.append(observer)
    
    def remove_observer(self, observer):
        if observer in self._observers:
            self._observers.remove(observer)
        
class ObserverMixin(metaclass=ABCMeta):
    """Mixin that will allow an object to observe and report on other objects."""
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    
    @abstractmethod
    def update(self, instance):
        return NotImplemented
An initial attempt at this pattern will use inheritance (or interfaces if you're using something like PHP or Java where single inheritance is the only option). The pattern is simple:
  • We store observers in a private (or at least as private as Python allows) list
  • When we need to notify the observers, we do so explicitly by hitting all of them and their update method
Observers are free to implement update in whatever way, but they must implement it. A simple implementation might look like this.
In [5]:
class Person(SubjectMixin):
    
    def __init__(self, name, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.name = name

class PrintLowerNoVowels(ObserverMixin):
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
    
    def update(self, instance):
        remove = 'aeiou'
        value = instance.name.lower()
        value = ''.join([c for c in value if not c in remove])
        print(value)

plnv = PrintLowerNoVowels()
me = Person(name="Alec Reiter", observers=[plnv])
me.notify()
lc rtr

This is generally how it's implemented -- at least in most of the articles I read. It's also possible to automate the notification via property. Say, we wanted to notify the observers every time we change the name attribute on a Person instance. We could write that logic every where. Maybe apply it with a context manager or decorator. But, tying it to the object makes the most amount of sense.
In [6]:
class Person(SubjectMixin):
    
    def __init__(self, name, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.__name = None
        self.name = name
    
    @property
    def name(self):
        return self.__name
    
    @name.setter
    def name(self, value):
        if value != self.__name:
            self.__name = value
            self.notify()

me = Person(name="Alec Reiter", observers=[plnv])
lc rtr

If we're concerned about automatically notifying the observers any time an attrbute is changed, we could just override __setattr__ to handle this for us. Which circumvents the needs to write properties for every attribute if this is the only action we're concerned with. It's super easy to implement as well.
In [7]:
class Person(SubjectMixin):
    
    def __init__(self, name, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.name = name
    
    def __setattr__(self, name, value):
        super().__setattr__(name, value)
        self.notify()
        
me = Person(name="Alec Reiter", observers=[plnv])
lc rtr

And that's all well and good. Not to mention a good deal less complicated that what I'm about to delve into. But it's also less fun for me. I'm not going to advocate for one of these implementations over the other except to say the one I'm going to focus on will offer a much finer grain of control.

Watching specific attributes

However, if we're concerned with monitoring specific attributes for changes, descriptors are the correct way to handle this. Why bother emitting an event every time age is changed if we only care about name or email?
The first step is to identify the logic we'd end up repeating in each property and moving that into a seperate object. We'll call this new class WatchedAttribute.
In [8]:
class WatchedAttribute(Descriptor):
    def __init__(self, name=None, *args, **kwargs):
        super().__init__(name, *args, **kwargs)
    
    def __set__(self, instance, value):
        if self.name not in instance.__dict__ or value != instance.__dict__[self.name]:
            instance.__dict__[self.name] = value
            instance.notify()

class Person(SubjectMixin, metaclass=checkedmeta):
    name = WatchedAttribute()
    
    def __init__(self, name, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.name = name
        
me = Person(name="Alec Reiter", observers=[plnv])
me.name = "Alec Reiter"
lc rtr

Now, we can add multiple attributes that are watched without rewriting the property each time to change the variable name. If we split the name attribute into first and last names, if we add an email attribute it's easy. Just add another WatchedAttribute entry on the class level and set it in __init__.
But I feel we can improve on this pattern as well. There's two big things I'm not a fan of with this implementation:
  • We manipulate the underlying dictionary to store the values.
  • The Subject is responsible for notifying the Observers.
We can fix both of these things, but the first will take us down a side road.

Alternative Data Store

The first issue is trickier. We need to relate instances to values without creating a mess we'll have to clean up later, or creating a memory leak that will absolutely murder a long running process. The most effective way of handling both of these is using weak references.

References

CPython (the implementation I'm using) utilizes reference counting to determine if an object should be garbage collected. When an object's reference count drops to 0, it's space in memory can be reclaimed by Python for use else where. Sometimes we only want to hold a reference to an object but not so tightly it won't be garbage collected if we forget about it. Consider this:
In [9]:
print(me)
registry = {"me" : me}
<__main__.Person object at 0x7f0d2c2c64e0>

Storing instances in a dictionary as keys or values (or in a list or set) as a form of caching is extremely common. But if we remove all the other instances of the object laying around...
In [10]:
del me
...that reference is left hanging around:
In [11]:
me = registry['me']
print(me)
<__main__.Person object at 0x7f0d2c2c64e0>

Before this gets too side tracked into weak references, I want to note that they're not a silver bullet and require a little more knowledge about Python to use efficienctly. You can still shoot your foot off with them. In this case, we're not using them prevent cycles but to instead maintain a cache.
Peter Parente wrote about weak references on his blog and while some of the information is out dated (the new module was deprecated in 2.6 and replaced with types), it's still relevant to understanding what weak references are. And Doug Hellman explored the weakref module in his Python Module of the Week series.
But the short of it is that an instance of WeakKeyDictionary, WeakValueDictionary or WeakSet will prevent this. Most things can be weak referenced -- the documentation goes into detail about what can be: "class instances, functions written in Python (but not in C), instance methods, sets, frozensets, some file objects, generators, type objects, sockets, arrays, deques, regular expression pattern objects, and code objects."
When you're attempting to use WeakKeyDictionary or WeakSet, the object must meet one more requirement: hashable. So objects like list or dict, even if they were implemented in Python, can't take advantage of these structures. However, outside of a few corner cases, this restraint won't affect us here.
Implementing it is incredibly easy.
In [12]:
from weakref import WeakKeyDictionary

class WatchedAttribute(Descriptor):
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.values = WeakKeyDictionary()
    
    def __get__(self, instance, cls):
        return self.values[instance]
    
    def __set__(self, instance, value):
        if instance not in self.values or value != self.values[instance]:
            self.values[instance] = value
            instance.notify()

class Person(SubjectMixin):
    name = WatchedAttribute()
    
    def __init__(self, name, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.name = name
        
me = Person(name="Alec Reiter", observers=[plnv])
lc rtr

You'll notice the metaclass we were using before is also gone now. That's because since we're storing the information on a cache inside the descriptor, it no longer needs to worry about what name it's being held under.

Descriptors as Subjects

The next issue was moving the publishing of events out of the main object. The main reason for this would be to notify only certain subscribers when an attribute changes but not all of them. This explores what happens when access a descriptor through the class and not an instance. Meaning answering, "What does cls (or type) do on __get__?

Accessing the Descriptor

Since descriptors are objects that just happen to follow a certain protocol that doesn't mean they can't have other methods on it. Or even follow multiple protocols. An object could be both a descriptor and an iterator for example. However, getting to these other methods can be tricky. We obviously can't do it through an instance, Python resolves that access to the __get__ method and returns a value.
This means we have to go through the class. But the way our descriptor is set up, it'll blow up when an instance isn't passed to it. We could simply return the instance of the descriptor when an instance isn't passed...would it work? Spoilers: It does. So we can fully move the registration of observers and notification out into the descriptors and our SubjectMixin can be redefined to work with our descriptor.
Actually, we end up redefining the Descriptor and WatchedAttribute classes as well. Forewarning, this is a bit of a code dump.
In [13]:
from weakref import WeakSet

class SubjectMixin:
    def __init__(self, observers=None, *args, **kwargs):
        self._observers = WeakSet()
        super().__init__(*args, **kwargs)
        
        if observers:
            self._observers.update(observers)
    
    def notify(self, instance):
        for observer in self._observers:
            observer.update(instance)
    
    def add_observer(self, observer):
        self._observers.add(observer)
    
    def remove_observer(self, observer):
        if observer in self._observers:
            self._observers.remove(observer)
In [14]:
class CachingDescriptor(Descriptor):
    
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._instances = WeakKeyDictionary()
    
    def __get__(self, instance, cls):
        if instance is None:
            return self
        return self._instances[instance]
    
    def __set__(self, instance, value):
        self._instances[instance] = value
In [15]:
class WatchedAttribute(CachingDescriptor, SubjectMixin):
    def __init__(self, observers=None, *args, **kwargs):
        super().__init__(observers=observers, *args, **kwargs)
    
    def __set__(self, instance, value):
        super().__set__(instance, value)
        self.notify(instance)
            
class Person:
    name = WatchedAttribute()
    
    def __init__(self, name, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.name = name
        
Person.name.add_observer(plnv)
me = Person(name="Alec Reiter")
lc rtr

There's some subtle changes going on here that you might miss unless you explicitly diff the preceeding implementation of SubjectMixin with this one.
The observer container is changed from a list to a WeakSet. Both are iterable, which means notify doesn't change (at least due to this). WeakSet plays off both the strengths of sets (which only contain unique items) and weak references. The only thing that WeakSet won't handle is keeping the observers in any particular sort of order and dealing with unhashable types -- neither of which affect us. You'll notice that adding elements to a set is slightly different from a list, so it's not a complete drop in replacement.
I will note I went back and forth between using WeakSet and a regular set. Mostly because if we remove all other references from an observer, do we intend to still have the observer still process requests? My thoughts on the matter is no, the observer should be considered dead. In other cases, the goal could be to have "anonymous" observers -- objects that are created and immediately injected into the framework rather than assigned to a name and passed in. If this is the desire, than WeakSet wouldn't keep the object from being immediately garbage collected. I'll leave the pros and cons of both approaches as an exercise to the reader. ;)
The next subtle difference is that SubjectMixin.notify now accepts an instance explicitly. Since we've displaced this logic to the descriptor, passing self to it ends up passing the instance of the descriptor rather than an instance of the class it's attached to.
Other than that, it's just a matter of knowing how multiple inheritance works. Which is a completely separate matter best left to another time. It involves liberal use of super to say the least.
The short of it is that WatchedAttribute combines the methods and data from both Descriptor and SubjectMixin together. Meaning Descriptor can worry about being a descriptor that stores information in a weak ref dictionary. And SubjectMixin can worry about being the basis for observed subjects -- it's applicable for both descriptors and other objects. WatchedAttribute just overrides how Descriptor.__set__ operates (or rather extends it if you want to split hairs) to combine the two fully.

Going Further

We could, of course, go further to registering observers for every instance of an object with the WatchedAttribute and specific instances as well. Implementing this is a just a mite trickier, but not terribly. The first step is to imitate the behavior of collection.defaultdict in WeakKeyDictionary. Emulating defaultdict is pretty straight foward and just depends on defining __missing__, setting a hook for it in __getitem__ and providing a constructor.
The reason for building this is to utilize WeakSet as a way to hold onto observers for us that are local to a particular instance.
In [16]:
class WeakKeyDefaultDict(WeakKeyDictionary):
    
    def __init__(self, default_factory=None, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.default_factory = default_factory
    
    def __getitem__(self, key):
        try:
            return super().__getitem__(key)
        except KeyError:
            return self.__missing__(key)
    
    def __missing__(self, key):
        if not self.default_factory:
            raise KeyError(key)
        value = self.default_factory()
        super().__setitem__(key, value)
        return value
With that built, we can reconstruct WatchedAttribute to hold both "global" and "local" observers.
In [17]:
class WatchedAttribute(CachingDescriptor, SubjectMixin):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._local_observers = WeakKeyDefaultDict(WeakSet)
    
    def __set__(self, instance, value):
        super().__set__(instance, value)
        self.notify(instance)
    
    def add_observer(self, observer, instance=None):
        if instance is None:
            super().add_observer(observer)
        else:
            self._local_observers[instance].add(observer)
            
    def remove_observer(self, observer, instance=None):
        if instance is None:
            super().remove_observer(observer)
        else:
            if observer in self._local_observers[instance]:
                self._local_observers[instance].remove(observer)
            
    def notify(self, instance):
        observers = self._observers | self._local_observers[instance]
        for observer in observers:
            observer.update(instance)
The real question, now, is how does it handle? It should handle the same as previous iterations on WatchedAttribute except for the specific behavior we've overriden here. I'm also going to add some convience methods to the Person class to make it slightly easier to interact with the observers.
In [18]:
class Person:
    name = WatchedAttribute()
    
    def __init__(self, name, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.name = name
    
    def access_watched(self, attr):
        return getattr(self.__class__, attr)
    
    def attach(self, attr, observer, global_=False):
        watched = self.access_watched(attr)
        inst = None if global_ else self
        watched.add_observer(observer, inst)
    
    def detach(self, attr, observer, global_=False):
        watched = self.access_watched(attr)
        inst = None if global_ else self
        watched.remove_observer(observer, inst)
        
class PrintUpper(ObserverMixin):
    
    def update(self, instance):
        print(instance.name.upper())
        

pu = PrintUpper()
me = Person(name=None)
me.attach('name', plnv, global_=True)
me.attach('name', pu)
me.name = "Alec Reiter"
lc rtr
ALEC REITER

In [19]:
other = Person(name="Ol' Long Johnson")
l' lng jhnsn

As we can see, the observer that prints the value of Person.name in upper case is bound only to the first instance of Person, where as the one that strips out the vowels and prints that result is bound to all of instances. It's also possible to create an ignore method that would allow specific instances to ignore certain observers as well. Or even better, create a set of rules that can be followed: "Only invoke this observer if the value doesn't change."
Something I've curiously ignored is pre-subscribing observers. That is to say, when we create the class we attach a predetermined list of observers to the attribute. This is a feature of the original SubjectMixin class and is inherited to WatchedAttribute (or as Raymond Hettinger would put it: WatchedAttribute delegates the work to SubjectMixin).
In [20]:
class Person:
    name = WatchedAttribute(observers=[plnv])
    
    def __init__(self, name, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.name = name
    
    def access_watched(self, attr):
        return getattr(self.__class__, attr)
    
    def attach(self, attr, observer, global_=False):
        watched = self.access_watched(attr)
        inst = None if global_ else self
        watched.add_observer(observer, inst)

me = Person(name="Alec Reiter")
lc rtr

Fin

This method of implementing the observer pattern allows a lot of very fine grained control. I'm not advocating it as a good solution -- or even a workable solution on its own. There's plenty that's left on the table as far as details and issues go. For example, how would this expand to using a messaging queue (ZeroMQ or Redis) to publish events to? Or how does it interact with asyncio or twisted? Integrating this pattern with an existing framework (blinker for example) would probably be the best solution
Rather, it's meant as an introduction to the true power of what you can do with descriptors beyond just making sure a string is all lower case or normalizing floating point numbers to decimal.Decimal instances. Which are valid uses of descriptors, don't take that the wrong way.
Some of the concepts introduced here -- manipulating descriptors on both the instance and class levels -- are used to build tremendously flexible systems. Ever wonder how SQLAlchemy seems to magically treat class attributes as parameters in search queries but then magically they're filled with data on the instance level? Descriptors and that if instance is None check.

Further Reading

No comments:

Post a Comment