One of my favorite shows is "How It's Made" -- my enjoyment mostly stems from learning how stuff is made, but the narrator's cheeky puns and jokes certainly add to it. But something I enjoy more than knowing how stuff is put together, is knowing how things work. I don't know what it is, but I have this childlike fascination with opening things up and learning how it fits together, what each part. That was one of my favorite things about my brief stint (a whooping six months!) in the automative service industry: understanding, a little better, how cars work. It certainly opened my eyes to all the work that goes into even simple automotive repairs.
Sadly, I no longer work on or with cars, I do still fiddle some with mine though, and if anyone has a good link to how a transmission -- manual or automatic -- actually works, I'd be thrilled! But this has left me with a hole in my life. One I've recently begun to fill with how Python operates under the hood -- so to speak. While my skills with C -- which basically amount to printf
and for
loops -- leave me woefully unprepared to examine much of the source, I can examine the surface parts.
To use a car analogy, if reading the C source for Python is repairing a damaged block or transmission, examining how Python works is more similar to replacing motor mounts and broken belts (something I'm regretfully too familiar with on my CRV). Whereas reading someone else's Python is like doing your own fluid changes. Flawed analogies aside, I'd like to more fully examine how Python objects work and what it really means to call foo.bar()
.
As a forewarning, this knowledge is great for understanding what's happening, but it's not crucial knowledge to working with classes and objects in the regular sense. All the things I will discuss here deal with how Python 3 handles them. Python 2 is slightly different.
Building a Class
To talk about Python's data model and how it relates to classes and objects, we should first write a class. It so basic as to wonder why we're doing it. The point is, rather than examine some fictional class or object, why not have one of our own to open up and poke at?
class Baz:
def __init__(self, thing):
self.thing = thing
def bar(self):
print(self.thing)
That's an extremely basic object. The initalizer takes a single argument a method that prints it out. Of course, we need to instantiate it for us to get use out of it.
foo = Baz(1)
Already, there's some mechanisms at work for us. I don't want to get too deep into class creation, but the short take away is the implicit __new__
classes inherit from object
handle object creation and __init__
simply sets the initial state of the object for us.. Delving into __new__
hooks into dealing with metaclasses, which is a topic for another time. What I want to focus on today is what happens when we call foo.bar()
Classes and Objects
You'll often hear that objects and classes in Python are simply nothing more than a pile of dictionaries with dotted access. This obtuse phrasing confused me for a long time and it wasn't until I began asking, "How the heck does self
actually get passed?" that I began to understand. Asking this began me down a rabbit hole that lead me to descriptors and __getattribute__
and what they do.
The Dict
All classes in Python have an underlying __dict__
and nearly every instance does as well. The first step to foo.bar()
is understanding that methods live at the class level.
print('bar' in Baz.__dict__)
print('bar' in foo.__dict__)
Methods are entries in the class's underlying __dict__
but not in the instance's. Because of this, most Python objects can remain relatively small, they simply store their state rather than all of their available methods as well. What does this method look like in the dictionary?
from inspect import isfunction, ismethod
print(isfunction(Baz.__dict__['bar']))
print(ismethod(Baz.__dict__['bar']))
print(Baz.__dict__['bar'])
We can see that in the class's dictionary, methods are stored as functions and not as methods. It's reasonable to infer that methods are actually functions that operate on class instances. From here, we can imagine that behind the scenes
Baz.__dict__['bar'](foo)
Attribute Access
The next piece of the puzzle is how Python handles attribute access. If you're not familiar with how Python attribute look up happens, in short, it looks like this:
- Call
__getattribute__
- Is the attribute in the object
__dict__
? - No? Is the attribute in the class's
__dict__
? - No? Is the attribute in any of the parent classes'
__dict__
? - No? Call
__getattr__
if present. - Else, raise an
AttributeError
Python starts at the bottom, calling __getattribute__
. This what actually allows the dotted access. You can think of the .
in foo.bar
to be implicit call to this method. This method translates dictionary look up to dotted access and invokes the rest of the chain. Since we already know that methods live in the class's __dict__
and methods are functions that act on the instance, we'll fast forward to there and extrapolate.
Since methods are functions that live in the class's dictionary and act on instances and __getattribute__
is an implicit transformation from attribute to dictionary look up, we can infer that method calls look like this behind the scenes:
Baz.bar(foo)
Methods vs Functions
So far so good. All this is pretty easy to grasp. But there's still burning question of how the heck is self
(or rather foo
) being passed to our methods. If we examine Baz.bar
and foo.bar
both, we can see there's a transformation going on somewhere.
print(Baz.bar)
print(foo.bar)
Python is some how transforming our function that lives in Baz
's dictionary into a method tied to our instance foo
. The answer lies in the descriptor protocol. I've written about it else where, and it's probably time to revise it again with my recent understanding. But essentially, descriptors add another rule to our attribute look up. Just before the __getattr__
call: If we recieved a descriptor, call the __get__
method on the descriptor.
This is our missing link. When a function is declared in the class, not only is it placed in the class's dictionary it's also wrapped by a descriptor. Or more accurately, a non-data descriptor because it only defines the special __get__
method. The way descriptors work is by intercepting lookup of specific attributes.
The Descriptor likely has a passing resemblance to this (of course, implemented in C):
from types import MethodType
class MethodDescriptor:
def __init__(self, method):
self.method = method
def __get__(self, instance, cls):
if instance is None:
return self.method
return MethodType(self.method, instance)
So, our initial thought of what foo.bar()
looks like under the covers was wrong. It more accurately resembles:
Baz.__dict__['bar'].__get__(foo, Baz)()
# if we inspect it we see the truth
print(Baz.__dict__['bar'].__get__(foo, Baz))
And in fact, if we put our imitation method descriptor into action, it works similarly to how object methods do.
def monty(self, x):
print(x)
class Spam:
eggs = MethodDescriptor(monty)
##of course, it's also useable as a decorator
@MethodDescriptor
def bar(self):
return 4
ham = Spam() # a lie if I ever saw one
print(Spam.eggs)
print(ham.eggs)
ham.eggs(1)
print(ham.bar())
The reason we see a function when we access the bar
method when we access it through the class is because the descriptor has already run and decided that it should simply return the function itself.
No comments:
Post a Comment