25 Feb 2009 (updated 25 Feb 2009 at 04:55 UTC)
»
Serializable Snapshot Isolation
This semester at Berkeley, I'm
taking CS286, the graduate-level data management course. In today's
class, we discussed a paper that I thought might
be of particular interest to Postgres hackers: "Serializable
Isolation for Snapshot Databases", by Cahill, Rohm and Fekete from
SIGMOD 2008.
The paper addresses a well-known
problem
with
snapshot
isolation (SI),
which is the isolation level that Postgres actually provides when you ask for
"SERIALIZABLE" isolation. SI basically means that a transaction sees a
version of database state that corresponds to the effects of all the
transactions that were committed before it began; it also sees the effects of
its own updates. This is not equivalent to true serializability,
however: that is, the database system can provide snapshot isolation, and yet
still allow a concurrent transaction schedule that is not equivalent to
some serial (one-at-a-time) transaction schedule.
To see why this is true, consider
two
concurrent
transactions
that
both
examine the state of the database, and then perform a write operation that
reflects the values that they just read. The paper provides a simple example:
suppose we have a database that describes the doctors in a hospital. We have
a program that wants to move doctors from "on-call" to "off-duty", as long as
there is at least one other doctor that is on-call. It's easy to see that if there
are two doctors and we run two instances of the program concurrently, under
SI rules we could end up with zero doctors on duty. This violates
serializability: there's no serial schedule of these two transactions that could
yield this erroneous database state.
The paper proposes a relatively
simple
modification
to
snapshot
isolation that
avoids this situations, by detecting a superset of the dangerous situations
and aborting one of the transactions involved. I'll leave the details of their
technique and the underlying theory to the paper, but it's very readable.
So, should we implement their
technique
in
Postgres?
It's an
interesting
idea,
but the implementation cost would be very non-trivial. Despite the paper's
claim that it imposes relatively little overhead on a traditional SI
implementation, it would require basically tracking the set of rows each
transaction has read, and keeping that information around for a bounded
time
period after the transaction has committed. I think that the performance
costs of
doing that naively would be too expensive for this to be feasible. Perhaps a
cheaper implementation is possible (e.g. by tracking page-level reads rather
than record-level reads)?
As an aside, it is somewhat bogus
for
PostgreSQL
to
provide
snapshot
isolation when the user asks for serializability; it is also a violation of the SQL
standard, I believe. That said, Oracle does the
same thing, so at least we're not alone, and it's hard to see a practical
improvement. The relevant section of the docs could
certainly make this point clearer, however.