We need a consistent, highly-available data store.1
There is no shortage of tools that provide one of these properties or the other. For example, mysql, postgres, and redis provide consistency but not high availability; riak, couchdb, and redis cluster (when it’s ready) provide high availability but not consistency. The only freely-available tool that attempts to provide both of these is zookeeper, but zookeeper is specialized, focused on locking and server management. We need a general-purpose tool that provides the same guarantees in a clean, well-designed package.
That’s why I’m excited to present doozer, a consistent, highly-available data store.
Soon after I started working at Heroku, Blake Mizerany got me involved in designing a “distributed init” system, something that could manage processes across multiple machine instances and recover gracefully from single instance failures and network partitions. One necessary building block for this is a network service that can provide synchronization for its clients – in other words, locking.
When we started building this distributed process monitor, we quickly realized that the lock service should be packaged as a separate tool. In true Unix style, it strives to do one thing well. In fact, doozer doesn’t actually provide locks directly. Its “one thing” is consistent, highly-available storage; locks are a separate thing, that clients can implement on top of doozer’s primitives. This makes doozer both simpler and more powerful, as we’ll see in a moment.
Locks Are Not Primitive Enough
Other similar systems (Chubby, Zookeeper) provide locks as a primitive operation, alongside data storage. But that requires the lock service to decide when the holder of a lock has died, in order to release the lock. In practice this means that every client that wants to obtain a lock must establish a session and send periodic heartbeat messages to the server. This works well enough for some clients, but it’s not always the most appropriate way to determine liveness. It’s particularly troublesome when dealing with off-the-shelf software that wasn’t designed to send heartbeat messages.
Doozer takes a different approach. It provides only data storage and a single, fundamental synchrionization primitive, compare-and-set. This operation is complete (you can build any other synchronization operation using it), but it’s simpler than a higher-level lock, and it doesn’t require the server to have any notion of liveness of its clients.
In the future, doozer might ship with companion tools that provide higher-level synchronization for users that want it, but these tools will operate as regular doozer clients, and they’ll be completely optional. If you don’t need their services, just don’t run them; if you need something slightly different from what they provide, you are free to make your own.
What is it good for?
How does doozer compare to other data stores out there? Redis is amazingly fast, with lots of interesting data structures to play with. HBase is amazingly scalable. Doozer isn’t particularly fast or scalable; its claim to fame is high availability.
Doozer is where you put the family jewels.
The doozer readme has a few concrete examples, but consider this one: imagine you have three redis servers – one master and two slaves. If the master goes down, wouldn’t it be nice to promote one of the slaves to be the new master? Imagine trying to automate that promotion. You’ll have to get all the clients to agree which one to use. It doesn’t sound easy, and it’s even harder than it sounds. If there is a network partition, some of the clients might disagree about which slave to promote, and you’d wind up with the classic “split-brain” problem. If you want this promotion to work reliably, you have to use a tool like doozer, which guarantees consistency, to coordinate the fail-over.
We have client drivers for Go (doozer/client) and Ruby (fraggle), with an Erlang driver in the works. The doozer protocol documentation gives the nitty-gritty of talking to doozer, but most of you will be interested in the interface provided by the driver you’re using; they each have documentation.
The words “consistency” and “high availability” have several reasonable definitions, and people too often use them without saying exactly what they mean. When I speak of consistency, I mean absolute prevention of inconsistent writes. The phrase “eventual consistency” is a synonym for “inconsistency”. When I speak of high availability, I mean primarily the ability to face a network partition and continue providing write service to all clients transitively connected to a majority of servers. Secondarily, I mean the ability to continue providing read-only service to all clients connected to any server. Update: This is not the same as being “available” in the sense of the frequenetly-cited and almost-as-frequently-misunderstood CAP theorem. Coda Hale has a good discussion of the CAP theorem’s applicability to real-world systems. In those terms, we choose consistency over availability.↩