A collection of computer, gaming and general nerdy things.

Tuesday, September 9, 2014

ORMs: Fighting against or working with.

Quite a while ago I posted about struggling with a complex join using SQLAlchemy -- more specifically, the Flask extension for SQLAlchemy. My original surmise, "The problem is how I'm trying to proxy, not with the relations", was wrong. My entire problem lay with the relations. Here's the relevant bits from the models:

In []:
class Member(db.Model):
    id = db.Column('id', db.Integer, primary_key=True)
    name = db.Column('name', db.String(64))

class Artist(db.Model):
    id = db.Column('id', db.Integer, primary_key=True)
    name = db.Column('name', db.String(64))

class Tag(db.Model):
    id = db.Column('id', db.Integer, primary_key=True)
    name = db.Column('name', db.String(64))
    
class MemberTaggedArtists(db.Model):
    id = db.Column('id', db.Integer, primary_key=True)
    member_id = db.Column('member_id', db.Integer, db.ForiegnKey('members.id'))
    artist_id = db.Column('artist_id', db.Integer, db.ForiegnKey('artists.id'))
    tag_id = db.Column('tag_id', db.Integer, db.ForiegnKey('tags.id'))

    member = db.relationship(Member, backref='tagged_artists')
    artist = db.relationship(Artist, backref='member_tags')
    tag = db.relationship(Tag, backref='applied_tags')

And here's what I orignally wanted:

In []:
kurashu89.tagged_artists = {'Gorguts': ['Death Metal', 'Progressive Death Metal', 'Weird Metal']}
##Include just the tag and the number of times it was applied.
Gorguts.tags = {'Death Metal':1, 'Progressive Death Metal':1, 'Weird Metal':1}
##Include just the artist and the number of times they were tagged 
Death_Metal.applied_to = {'Gorguts':1} 

And then I forgot everything I knew about SQL. Well...not really. I knew how I'd do it in SQL. And I was hoping there'd be a magic bullet in SQLAlchemy that'd answer this for me. So, I fretted about it for months: I originally ran head first into this issue in April (actually, a little before when I began mocking this application up in my notebook and whiteboard). It wasn't until recently that I've been seriously digging in and writing a lot of code and learning more and more. I watched a talk earlier today that was a major break through: Flexing SQLAlchemy's Relational Power by Brandon Rhodes.

In the presentation he talks about having an "O" problem -- focusing too much on the Object portion of an ORM and ignoring the Relational Mapper portion. SQLAlchemy doesn't write queries for you. It makes them easier. I was hoping for a magic bullet, and in that I was too blind to my present knowledge.

SQL is Magic

Many programmers are familiar with several types of programming and application structures: Object-oriented, procedural and functional are the three that seem to get the most amount of love these days. But when dealing with data, few things will beat declarative. SQL is unusual because you don't tell it how to do something, you tell it what you want.

I'd argue few things are less understood than SQL when it comes to new programmers. Maybe Assembly. Or Perl.

But it's saying, "I know you're totally in the mindset of $LANGUAGE but you gotta get your data from the database so do a complete mental shift into SQL." People start to feel at home with PHP but when databases come into the pictures it's just...uh, what...I don't remember what I was doing.

I've been lucky that when I started learning programming, my dad was something of a Swiss army knife web developer/IT...thingperson (there was this period where he had the title SME but no one knew what it meant). He'd lead me on and give me just enough that I'd know where to look at piecing the rest of it together. I was forced to learn SQL. Maybe not well enough to be a DBA, but better than some. And I'll tell you what, "Thank you dad!" Knowing just enough SQL to scrap by and how to parse the Postgres documentation (seriously, read it some time, it's equal parts confusing and informative) has saved my butt more times than I can count.

Here's a small snippet from a PHP script thrown together to display a galleria.js website:

In []:
SELECT g.g_id, g.g_url, g.g_name,
(SELECT p.thumb FROM pictures p WHERE p.g_id = g.g_id ORDER BY p.p_id DESC LIMIT 1) AS cover,
(SELECT count(c.p_id) AS count FROM pictures c WHERE c.g_id = g.g_id) AS count
FROM galleries g
ORDER BY g.g_name ASC;

It's not the most beautiful or wonderful query ever written, but I'm not professing to be a genius either. It worked and it did exactly what I needed it to do: get a gallery's id, url and name as well as a URL to a picture inside the gallery and the number of pictures inside of it. Would I do it differently now? Yes, I'd join pictures and use count() as an aggregate function, I'd probably still retrieve the cover photo the same way because, frankly, I'm a little slack when it comes to joins.

But look at that. One second you're looking at PHP code, being like "Alright. Okay. ...really? I mean I guess." Then you hit this block of SQL. And you have to enter a different mindset. There's very little SQL has in common with PHP -- or frankly, many popular programming languages.

When I started really digging into Python and discovered SQLA, I was in love. Mike Bayer and his team really pulled out all the stops to create a wonderful toolbox. I remember thinking "I'll never write another query again!"

I was wrong: ORMs as Vietnam

ORMs have been infamously declared "The Vietnam of Computer Science". Is it a fair comparison? Honestly: I don't know. In 2006 I was 17 and barely managing to keep a D&D campaign site running, I wrote PHP and SQL and HTML and CSS together in one file. I wouldn't have known what an ORM was if I was beaten with one -- keep in mind this was still the day of PHP's mysql/pgsql functions, I think PDO had been introduced but my memory is hazy.

Eight years later, one and a half with using an ORM (poorly), I feel differently. I fundamentally agree wih Ted Neward: O/R mapping starts as something great but as you go on with half truths and misinformation, not really knowing what your goal is, things get really bad really quickly. I won't delve into extreme details, ORMs have been discussed to death in this context.

There's this great Mythos built up around ORMs. Mostly, they're half truths and lies of omission. But there's one told time and time again that fools even the wisest people I know: "You'll never write another query again!" Or as Jeff Atwood put it: "There are a lot of magic bullets out there, and no shortage of naive developers." I was looking for my magic bullet.

The Light

I can't profess to other ORMs, I've been tangentially exposed to Django's ORM and I'm aware that Rails and PHP have their own implementations. But SQLAlchemy is an absolute charm to work with. When you know what you're doing. Again, I'm not professing to be a genius. But watching Bradon Rhodes' presentation on flexing SQLA's relational power, I realized I was hiding from joins and queries and instead trying to tell SQLA "Just do it for me. Pleeeease." I'd been fighting against SQLA. SQLA is designed not to hide SQL from the developer. It's meant to say, "You're writing SQL but you're doing it with Python instead of inside Python." There's no hiding from SQL with SQLA -- it's even in the name.

I'd written a query in my project notebook, quite a while ago. Because I knew how to answer my question using pure SQL:

In []:
SELECT tags.name AS tag,  tags.id AS id, count(mta.member_id) AS count
FROM tags, membertaggedartists AS mta
WHERE tags.id = mta.tag_id AND mta.artist_id = :artist_id
GROUP BY id
ORDER BY count DESC, name ASC

It wasn't until I realized what I was not doing. Applying this knowledge to SQLA. I had also been lulled into complacency by Flask-SQLA's query object. Don't get me wrong, it's fantastic for simple queries and joins but for anything remotely complex all bets are off. Looking at what I wanted as output it's simply: The tag's id and name and a count of how many member attached it to a particular artist This translates almost 1:1 into SQLA's query model:

In []:
q = db.session.query(
    models.Tag.id.label('id'), 
    models.Tag.name.label('tag'), 
    db.func.count(models.MemberTaggedArtist.member_id).label('count')
    )
q = q.join(models.MemberTaggedArtist, models.Tag.id == models.MemberTaggedArtist.tag_id)
q = q.filter(models.MemberTaggedArtist.artist_id == artist.id)
q = q.group_by(models.MemberTaggedArtist.tag_id)
# order by tag count first, then tag name if two tags happen to have the same count
# there's probably a much better way to get this value without recalculating the result
q = q.order_by(db.desc(db.func.count(models.MemberTaggedArtist.member_id)))
q = q.order_by(models.Tag.name)

It's more verbose, but it's mostly because I'm going through my models module instead of just importing the models into the current namespace. A very nice thing about SQLA is you can view the query that it'll generate from this by print the statement (q in this context). What's it look like?

In []:
SELECT tags.id AS id, tags.name AS tag, count(membertaggedartists.member_id) AS count
FROM tags 
JOIN membertaggedartists ON tags.id = membertaggedartists.tag_id
WHERE membertaggedartists.artist_id = :artist_id_1 
GROUP BY membertaggedartists.tag_id 
ORDER BY count(membertaggedartists.member_id) DESC, tags.name

Exactly the same as mine (well, there's an explicit join instead of my implicit one).

When I stopped fighting SQLA and expecting it to be a magic bullet and realized, "Hey, I know SQL" my problems with this issue just...went away.

ORMs can be painful to use. Especially when they hide away important details and decide things for you. SQLA does very little of either (and will usually defer to the database it's connected to when it does). But when you work with a good one, and not against it, they are wonderful tools.

No comments:

Post a Comment