Advogato: Blog for joolean

gzochi

gzochi 0.7 is out. Get it here.

This was a tough release to get out, in no small part because I'd decided to provide a suite of data manipulation tools to support the practical administration of a gzochid server. I wanted to make it easy to export and import data, for the purposes of backup and restore, and to perform large-scale schema transformations of serialized data, to support changes to data structures that occur as part of the evolution of a game application.

The first two were (relatively) easy. I looked for prior art and found some in the utilities that ship with Berkley DB, the most mature (and as of last year, the most appealingly licensed) of the storage engines the server supports, db_dump and db_load. It was pretty easy to make some higher-level ones that do the same thing (read and write database rows in a portable format) in terms of gzochid's abstract storage API: gzochi-dump and gzochi-load.

Migrating data is a different story. It's not enough to process the stream of rows as they're emitted from the object store. The objects that make up the state of a game form a graph, so you need to traverse that graph, marking nodes as they're visited. It also means that migrations must be done mostly in-place; fully transforming any single reachable node in the graph may require adding new nodes, removing existing ones, or changing the structure of other parts of the graph. Furthermore, the nature of the transformation to be done complicates the process: Each row of data is effectively a plain byte array, the only metadata being a prefix that provides a key into the application's registry of types; the rest of the data has no structure outside of what the application's serialization code applies to it. To transform the data but preserve its "type," two different serializers must be used -- one to read the original data in, the other to write the transformed data back out. This is indeed a problem RedDwarf Server faced because of its reliance on Java's built-in object serialization mechanism. Some rather pointed discussion of the problem can be found in this forum thread. Someone on that same thread mentions ICE and its sub-project, Freeze, which apparently solves a similar problem.

I did a little reading on these technologies, and while they didn't turn out to be -- nor did I expect them to be -- a fit for my needs, they got me thinking about how migrating a data set like this one is a complex enough operation that it might need some first-class modeling, beyond just defining the logic of the transformation. The solution I wound up with involves XML-based descriptions of input and output (read: deserialization and serialization) configurations and provisioning real storage for the state of the migration itself. Doesn't feel like too much; hope it's enough.

23 Aug 2014 joolean » (Journeyer)