For the sloooooow Thanksgiving holiday... Happy TG, everyone US!
Object-relational stuff, revisited
After I dissed on Jeff Watkin's ORM assumptions & logic,
Sean Jensen-Gray staged an intervention & basically told me I was acting
like a git. He's undoubtedly right, and I'm continuing that part
of the conversation off-line where it belongs.
However, in the name of separating the smoke from the fire, here's some
more discussion about ORMs.
First of all, here's
my ORM, cucumber2, just so y'all know where I'm coming from. I
make no claims about generality or quality or goodness, except to say
that I like it & have been using its predecessor for over 4 yrs now.
Works great. cucumber2 is some of the nicer bits of concept code I've
ever written; it's definitely on my refrigerator. (YMMV...)
Based on my relatively minimal experience with ORMs, then, here are
some of my own beliefs about ORM writing in Python.
- Use "magic".
Properties, metaclasses, introspection, dynamic code generation, and
"under-cover twiddling" can all help make a clean piece of code. Not
using them can hurt by making your code over-verbose and cluttering
your APIs with information not relevant to the task at hand.
Document your use, test your use, sure -- but use them.
- Object-relational
impedance mismatch is a big issue.
Do I need to say more? Just think: how do you encode a collection in
a database? (Make sure you're maintaining referential integrity in
your answer...) How do you encode an inheritance hierarchy? These
are simple examples of a serious mismatch between the relational model
and the object model. This is the problem that new ORMs should try to
solve, IMO.
- Don't start out to write a database-generic ORM.
There's lots of discussion about using database-specific features in the
SQL world (although my google-fu is failing me...), so I won't rehash
that. I come down solidly on the side of committing yourself to a
specific database. I think it's particularly important in the case of
an ORM, which may use *very* database-specific stuff to work its magic
(e.g. cucumber2 and the PostgreSQL ORDBMS features). Porting this
magic between databases is likely to get very hairy & involve lots of
additional complexity.
The attempt to make your ORM generic to multiple databases may well be
a specific case of premature optimization (below); it seems like
over-reaching oneself by attempting to encompass database-generic
issues prior to settling on a good, clean API.
- Make sure you can still use straight SQL.
Do you have specialized metainfo that will break SQL
queries/inserts/etc. that don't know about this information? If so,
this seriously reduces the utility of your databases: you can't
use external tools any more, without adding in ORM-specific awareness.
Even if you can hack this in with triggers and VIEWs, you're adding a
whole 'nother layer of complexity. Bad.
- Premature optimization is the root of all evil. (Hoare via Knuth)
(Ironically, the first
few
google hits seem to be dedicated to discussing when this rule doesn't
apply...)
This covers things like caching and cache invalidation code, which in
my experience is difficult to handle generically (although possible,
esp. if you only allow transaction-wrapped access). Also, SQL query
optimization is tough to do in SQL, much less in a layer wrapped
around SQL. In many cases, you should consider optimizing by writing
app/data-model-specific SELECT statements that integrate with your ORM
interface.
Most of all, think of your ORM like an object database.
Layering a procedural interface on top of an SQL database isn't
building an ORM -- it's building a library that talks to an SQL
database. Useful, but probably not new. If you solve a hard problem --
even poorly -- that's new.
For example, one of my absolute requirements: can you determine the
class of an "object" (row, tuple, whatever) in the database without
using metadata that's stored external to the database (like, say, in
your Python object)? I think that's a pretty ORMy requirement,
myself, and it helps to not violate condition #4 (straight SQL) above.
Another requirement: can you store object hierarchies
straightforwardly? Again, seems ORMy to me, but it speaks to the
impedance mismatch problem -- it's a tough requirement.
Looking over this list, I think these are all pretty tough
requirements. You would be justified in asking "well, why not just
use an object database, then?"
There are a few obvious reasons.
- Requirements. Maybe you have to (or really really
prefer to) use an SQL database. Your support staff only understands
SQL; your SQL backups are automated; you really like SELECT queries
and the command-line interface; or your boss tells you you have to.
- Language neutrality. Say what you will, but SQL databases
are admirably language neutral... suppose you have to access
the database from multiple languages, like Java, Python, Ruby, and
Perl. Most object databases are language-specific (for obvious
reasons...) so you're stuck with a relational DBMS.
- Maturity. I personally dislike this argument, but: SQL
databases like Oracle, PostgreSQL, MySQL, etc. have a long history and
the flaws are well known. Not so with ODBMSs.
- Teamwork. You work with people that only grok SQL. I am
sympathetic to this argument, coming from an academic environment with
moderately high turnover and people who have relatively little
software engineering background.
- Query performance. If a lot of your data is fundamentally
organized in relational ways, I bet your SELECT statements can be
heavily optimized in ways that no object database can match.
- Support. Lots of companies support SQL databases. Not so
many support object databases.
OK, so what use is an ORM? I'm assuming anybody who's made it this
far is already sold on ORMs, but just in case, here are a couple of
my reasons:
- Impedance mismatch. Object-oriented languages organize
data differently than the normal SQL data-model. You really want
to be able to take advantage of both. (Or at least I do.)
- Programming reliability and security. There are a number
of mistakes -- some obvious, some not so much -- that can be made by
SQL programmers. Hell, you're generating SQL code in another language
-- how can this not be problematic? (It's largely solved by using
appropriate libraries for SQL access, mind you.)
- Joins. I don't know about you, but I'm not smart enough to
understand LEFT OUTER JOINs. (Could someone else please write a library
to do it for me, intelligently?)
You would now be even more justified in calling me somewhat nuts. I
have strict requirements for an ORM that are nigh impossible to meet,
and lots of reasons why you might be stuck with an SQL database. Yet
I've also given a few good reasons to use an ORM. What to do?
My first point: it's not an easy problem. That's why seriously smart
people -- much smarter than me -- have thought deeply on the
matter and come up with very little.
My second point: it's worth tackling. 'nuff said, here; I think the
benefits are obvious.
My third point: I personally guess that there are solutions to most of
the problems that I lay out for ORMs, and these solutions lie in the
dynamic nature of languages like Python (and probably languages like
Ruby and Perl). Certainly I can easily do interesting things in
Python that are tremendously difficult to do in Java, although many of
these things use the "black magic" of metaclasses.
OK, there's no real conclusion here.
I'm at least minimally satisfied with the approach I've taken in
cucumber2. Again, YMMV. Apart from polishing and optimizing the code
a bit, I'm thinking about taking a pyparsing-style approach to
SELECTs. More on that next time I get the yen to hack on something
other than twill ;).
I hope you're at least mildly entertained by my wild-eyed ORM discussion,
and I look forward to the horde of disapproving comments. (Luckily, I've
disabled comments on this blog, so I won't have to make them public if
I don't want to. [0])
cheers,
--titus
p.s. apologies for the weird formatting... advogato *shrug*
[0] I feel compelled to point out that this sentence is a joke.