Recent blog entries for joolean

23 Apr 2017 »

gzochi

gzochi 0.11 is out. Enjoy it in good health.

The major innovation over the previous release is that the client side of the distributed storage engine now releases the locks it requests from the meta server. This wasn't easy to orchestrate, so I want to say a little bit about how it works.

Some context: The distributed storage engine is based on a paper by Tim Blackman and Jim Waldo, and it works a bit like a cache: The client requests an intentional lock (read or write) on a particular key from the server, and if the server grants the client's request, it serves up the value for the key along with a temporary lease - essentially a timeout. For the duration of the lease, the client is guaranteed that its lock intentions will be honored. If it's holding a read lock, no other clients can have a write lock; if it's got a write lock, no other clients can obtain a read or write lock. Within the client, the key is added to a transactional B+tree (a special instance of the in-memory storage engine) and game application threads can execute transactions that access or modify the data in the B+tree just as they would in a single-node configuration. When a transaction completes, new and modified values are transmitted back up to the meta server, but they also remain in the local B+tree for access by subsequent transactions. When the lease for a key expires - and the last active transaction using the affected key either commits or rolls back - its lock is released, and the client must petition the meta server to re-obtain the lock before it can do anything else with that key.

The tricky parts happen at the edges of this lifecycle; that is, when a lock is obtained and when it is released. In both cases, the client's view of available data from the perspective of active transactions must change. When the client obtains a lock, it gains access to a new key-value pair, and when it releases a lock, it loses that access. These changes occur asynchronously with respect to transaction execution: The arrival of a message from the meta server notifies the client that a lock has been granted (or denied) and lock release is governed by a local timeout. It's tempting to try implement these processes as additional transactions against the B+tree, such that when a new key is added or an expired key is removed, the modification to the B+tree occurs in a transaction executed alongside whatever other transactions are being processed at the time. Unfortunately, this can lead to contention and even deadlock, since adding or removing keys can force modifications to the structure of the B+tree itself, in the form of node splitting or merging. What to do, then, given that it's not acceptable that these "system" transactions fail, since they're responsible for maintaining the integrity of the local cache? You could adjust the deadlock resolution algorithm to always rule in favor of system transactions when it comes to choosing a transaction to mark for rollback, but since lock acquisition and release are relatively frequent, this just transfers the pain to "userland" transactions, which would in turn see an undue amount of contention and rollback.

The answer, I think, involves going "around" the transactional store. For newly arriving keys, this is straightforward: When a new key-value pair is transmitted from the meta server as part of a lock acquisition response, don't add it to the B+tree store transactionally; instead, add it to a non-transactional data structure like a hash table that'll serve as a cache. Transactions can consult the cache if they find that the key doesn't exist in the B+tree. The first transaction to modify the value can write it to the B+tree, and subsequent transactions will see this modified value since they check the B+tree before the cache.

Orchestrating the removal of expired keys is more challenging. Because the B+tree takes precedence over the cache of incoming values, it's crtical that the B+tree reflect key expiration accurately, or else new modifications from elsewhere in the cluster may be ignored, as in the following pathological case:

Transaction on node 1 obtains write lock on key A, value is stored in cache
Transaction on node 1 modifies key A, writing new value to the B+tree
Transaction on node 1 releases lock on key A
Transaction on node 2 obtains write lock on key A, modifies it, and commits
Transaction on node 2 releases lock on key A
Transaction on node 1 obtains read lock on key A, new value stored in cache
Transaction on node 1 attempts to read key A, but sees old value from B+tree

After some consideration, I borrowed a page from BigTable, and attacked the problem using the concept behind tombstone records. Every record - even deleted records - written to either the B+tree or the incoming value cache are prefixed with a timestamp. To find the effective value for a key, both the B+tree and the cache must be consulted; the version with the most recent timestamp "wins." In most BigTable implementations (e.g., HBase and Cassandra) there's a scheduled, asynchronous "compaction" process that sweeps stale keys. I didn't want to run more threads, so I prevent stale keys from piling up by keeping a running list of released keys. Once that list's length exceeds a configurable threshold, the next transaction commit or rollback triggers a stop-the-world event in which no new transactions can be initiated, and a single thread sweeps any released keys that haven't since been refreshed. With a judiciously configured threshold, the resulting store performs well, since the sweep is quite fast when only a single transaction is active.

This was a real head-scratcher, and I feel proud of having figured it out despite having skipped all databases courses during my undergraduate coursework. Download my project and try it out for yourself!

23 Aug 2016 »

gzochi

gzochi 0.10 is out, and it's a big release, befitting its double-digit minor version, I hope. In particular, it includes an initial version of the meta server, which, as its name suggests, provides services to a cluster of gzochi application servers. In this first iteration, the meta server manages R/W locks on portions of a game's object graph, and feeds temporary copies of the relevant object data to the application server nodes to fuel task execution. In the future, it'll also be responsible for client load balancing and coordinating channel message delivery across the cluster.

I've been working on this release since the beginning of the year, although much of that time was spent brooding over the way these systems work in Project Darkstar and in factoring out some core components of the gzochid container to make them available to the meta server (including a kind of minimal dependency injection framework built on top of GObject - surprised no one's done that before). In the design I ultimately settled on, the native, memory-backed storage engine plays a key role, acting as a cache for object data received from the meta server, which maintains the canonical stores of game data. And so I also spent a lot of cycles examining the performance and concurrency profile of that system, tweaking lock orderings and tree traversal algorithms.

One of the more surprising things I discovered was that locks in the memory storage engine as of 0.9 were too granular, and that this made the store unusually prone to deadlock. As I mentioned in an earlier post, the architecture of the memory storage engine was based on my reading of the Berkeley DB source code - especially the B+tree implementation. There's a lot of stuff in BDB, though, and it's got plenty of features that I was pretty sure I could ignore while building my own transactional data store. One thing I decided to leave out of my implementation was pages. After all, if I didn't need to transfer this data to or from durable storage, then why bother partitioning it into disk bandwidth-friendly chunks? It turns out, though, that pages serve another valuable purpose: They limit the granularity of locks and force concurrent transactions to serialize their writes when accessing keys within some threshold distance, especially with respect to insertions of new key-value pairs. Take the following example:

Transaction 1 attempts to update key a1 (write lock a1)
Transaction 2 attempts to read key a2 (read lock a2)
Transaction 2 attempts to insert key a3 (write lock a3)
Transaction 1 attempts to update key a2 (write lock a2)

In principle, there's nothing about this lock ordering that should lead to deadlock. The two transactions are accessing key a2 in incompatible ways, but that just means that the second transaction might need to wait for the first to complete before it can get a write lock on a2. However, in a B+tree, the interstitial structural nodes - not just the "leaf" key-values - are subject to read/write locking when structural modifications such as inserts are made to the tree. So the sequence of locks above actually means:

Transaction 1 attempts to update key a1 (write lock a1, read lock parent a)
Transaction 2 attempts to read key a2 (read lock a2, read lock parent a)
Transaction 2 attempts to insert key a3 (write lock a3, write lock parent a)
Transaction 1 attempts to update key a2 (write lock a2, read lock parent a)

With this additional bit of context, it becomes clear how the contention between these transactions can lead to deadlock: The first transaction can't make progress until it can add get a write lock on key a2, which the second transaction won't release until it can get a write lock on interstitial parent node a to add the key a3 to it. This pattern of clustered reads and insertions is quite common for gzochi application tasks in which some data from the game object graph is read or updated and then another task is durably scheduled for execution. But this is also where pages can help! By grouping regions of an ordered keyspace into "chunks" of keys that have to be locked in bulk, the granularity of interleaved access between concurrent transactions is effectively capped, and transactions attempting to update or insert proximal keys are forced to serialize. Bad for concurrency, but ultimately good for performance in a latency-sensitive environment where retrying a transaction from scratch hurts. Page locking makes the following change to the lock sequence above:

Transaction 1 attempts to update key a1 (write lock page a)
Transaction 2 attempts to read key a2 (read lock page a)
Transaction 2 attempts to insert key a3 (write lock page a)
Transaction 1 attempts to update key a2 (write lock page a)

In this scenario, because the first transaction's update attempt requires a write lock on the entire page, the second transaction can't acquire that toxic read lock. The practical result is that the two transactions execute in serial, like so:

Transaction 1 attempts to update key a1 (write lock page a)
Transaction 1 attempts to update key a2 (write lock page a)

[Transaction 1 commit: unlock page a]

Transaction 2 attempts to read key a2 (read lock page a)
Transaction 2 attempts to insert key a3 (write lock page a)

After making the switch to pages in the memory storage engine, the rate of deadlock for these kinds of transaction workflows dropped sharply, and is now roughly the same as that of the BDB-based storage engine. ...Which stands to reason.

Check out the release! I'm proud of it.

23 May 2015 (updated 23 May 2015 at 22:30 UTC) »

gccgo and Autotools

As part of a personal project, I wrote a small Go program recently to marshal some data from a MySQL database as JSON over HTTP. I hadn't written a lot of (read: any) Go before this, and I found the process relatively painless and the implementation much more concise than the alternatives in Java, PHP, or Python. However, when I went to integrate my program with the rest of the Autotools build for my project, I ran into some obstacles, mostly related to the incomplete / broken support for Go in the current stable version of Autoconf (2.69). There are apparently fixes for the most obvious bugs coming in Autoconf 2.70, but that release has been in a development for years; and even once released, to the best of my knowledge, it won't include important features like tests for available Go libraries. So I spent a few days working on a better Autotools Go integration, which I'm attaching below.

A few notes to make what follows a bit clearer:

First, Go already has a build system, more or less - it's called go. If you've got a library or a program called "mypackage/myproject/myprog," and you've put the source files in the right directory, you can run...

go build mypackage/myproject/myprog

...and wind up with a working, statically-linked executable. What is the right directory? It's the "src" directory under one of the directories in $GOPATH. The Go project has a good amount of documentation on the topic of code organization, but in brief, the layout of a directory that forms a component of your GOPATH should be:

pkg/[COMPILER_]$GOOS_$GOARCH/: Compiled library files go here
bin/: Compiled executable files go here
src/: Source code goes here, grouped by package

The go command is a front-end for the native go compilers (6g, 6l, 8g, 8l, etc.) as well as for gccgo (via the -compiler flag). It figures out where all the external dependencies are in your $GOPATH and passes flags and libraries to the compilers and linkers. If you run gccgo directly - that is, without using the go front-end - you have to assemble these arguments and paths yourself.

`go build' is the mainstream, nicely literate way of executing a build for .go files, and it's how most people familiar with the language will go about it. However, Autotools' existing support for Go unsurprisingly depends on direct interaction with gccgo, and I wouldn't expect that to change in any near-term releases. `go build' is convenient for fast, iterative builds; I find Autotools-based builds useful for packaging a source distribution for delivery to environments that need to be interrogated to locate dependencies. I wanted my project's build to work both for people doing `./configure && make' as well as for people running `go build'.

The files below provide:

A behind-the-scenes patch for the broken `AC_PROG_GO' in Autoconf 2.69
A new macro implementation - AC_CHECK_GOLIB - that finds and tests dependencies for Go programs and libraries, and which behaves similarly to pkg-config's `PKG_CHECK_MODULES'.
A working example of an Autotools build for a small project that uses Autotools for its build and depends on some common Go web service and database libraries.

m4/golib.m4

Provides the patch and macro implementation. Bundle this file with your project to apply the changes locally, or put it in `/usr/local/share/aclocal' to make it available system-wide.



# Undefine the broken _AC_LANG_IO_PROGRAM from autoconf/go.m4...

m4_ifdef([_AC_LANG_IO_PROGRAM(Go)], m4_undefine([_AC_LANG_IO_PROGRAM(Go)]))

# ...and redefine it to use a snippet of Go code that compiles properly.

m4_define([_AC_LANG_IO_PROGRAM(Go)],
  [AC_LANG_PROGRAM([import ( "fmt"; "os" )],
  [f, err := os.OpenFile("conftest.out", os.O_CREATE | os.O_WRONLY, 0777)
   if err != nil {
     fmt.Println(err)
     os.Exit(1)
   }
   if err = f.Close(); err != nil {
     fmt.Println(err)
     os.Exit(1)
   }
   os.Exit(0)
  ])])

#
# Support macro to check that a program that uses LIB can be linked.
#
# _AC_LINK_GOLIB(VARIABLE, LIB, [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND])

AC_DEFUN([_AC_LINK_GOLIB],[

  # This little "embedded" shell function outputs a list of dependencies for a
  # specified library beyond the set of standard imports.

  AS_REQUIRE_SHELL_FN([ac_check_golib_nonstd_deps],
    [AS_FUNCTION_DESCRIBE([ac_check_golib_nonstd_deps], [LINENO],
      [Find the non-standard dependencies of the target library.])],
    [for i in `$ac_check_golib_go list -f {{.Deps}} $[]1 | tr -d [[]]`; do
       $ac_check_golib_go list -f {{.Standard}} $i | grep -q false
       if [[ $? = 0 ]]; then
         echo $i
       fi
     done])

  ac_check_golib_paths=`$ac_check_golib_go list -compiler gccgo \
    -f {{.Target}} $2`
  ac_check_golib_seeds=`ac_check_golib_nonstd_deps $2`
  ac_check_golib_oldseeds=""

  # Compute the transitive closure of non-standard imports.

  for ac_check_golib_seed in $ac_check_golib_seeds; do
    ac_check_golib_oldseeds="$ac_check_golib_oldseeds $ac_check_golib_seed"
    ac_check_golib_newseeds=`ac_check_golib_nonstd_deps $ac_check_golib_seed`

    for ac_check_golib_newseed in $ac_check_golib_newseeds; do
      if ! echo "$ac_check_golib_oldseeds" | grep -q "$ac_check_golib_newseed"
      then
        ac_check_golib_oldseeds="\
	  $ac_check_golib_oldseeds $ac_check_golib_newseed"
      fi
    done

    ac_check_golib_seeds="$ac_check_golib_seeds $ac_check_golib_newseeds"
    ac_check_golib_path=`$ac_check_golib_go list -compiler gccgo \
      -f {{.Target}} $ac_check_golib_seed`
    ac_check_golib_paths="$ac_check_golib_paths $ac_check_golib_path"
  done

  ac_check_golib_save_LIBS="$LIBS"
  LIBS="-Wl,-( $LIBS $ac_check_golib_paths -Wl,-)"

  AC_LINK_IFELSE([],
    [$1[]_GOFLAGS="-I $ac_check_golib_root"
     $1[]_LIBS="$ac_check_golib_paths"
     AC_MSG_RESULT([yes])
     $3
     LIBS="$ac_check_golib_save_LIBS"
     break],
    [AC_MSG_RESULT([no])
     m4_default([$4], AC_MSG_ERROR([Go library ($2) not found.]))
     LIBS="$ac_check_golib_save_LIBS"])
])

#
# Attempts to locate a Go library LIB somewhere under $GOPATH that can be used
# to compile and link a program that uses it, optionally referencing SYMBOL.
# Calls ACTION-IF-FOUND if a usable library is found, in addition to setting
# VARIABLE_GOFLAGS and VARIABLE_LIBS to the requisite compiler and linker flags.
#
# AC_CHECK_GOLIB(VARIABLE, LIB, [SYMBOL], [ACTION-IF-FOUND],
# [ACTION-IF-NOT-FOUND], [GO])

AC_DEFUN([AC_CHECK_GOLIB],[
  AC_ARG_VAR([$1][_GOFLAGS], [Go compiler flags for $2])
  AC_ARG_VAR([$1][_LIBS], [linker flags for $2])

  AC_MSG_CHECKING([for Go library $2])

  ac_check_golib_go="$6"
  if test -z "$ac_check_golib_go"; then
    ac_check_golib_go="go"
  fi

  # The gccgo compiler needs the `pkg/gccgo_ARCH` part of the GOPATH entry that
  # contains the target library, so use the `go' command to compute the full
  # target install directory and then subtract out the library-specific suffix.
  # E.g., /home/user/gocode/pkg/gccgo_linux_amd64/foo/bar/libbaz.a ->
  # /home/user/gocode/pkg/gccgo_linux_amd64

  ac_check_golib_root=`$ac_check_golib_go list -compiler gccgo \
    -f {{.Target}} $2`
  ac_check_golib_root=`dirname $ac_check_golib_root`  
  ac_check_golib_path=`dirname $2`

  ac_check_golib_root="${ac_check_golib_root%$ac_check_golib_path}"

  # Save the original GOFLAGS and add the computed root as an include path.

  ac_check_golib_save_GOFLAGS=$GOFLAGS
  GOFLAGS="$GOFLAGS -I $ac_check_golib_root"

  AS_IF([test -n "$3"],
    [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([import ("os"; "$2")],[
        if $3 == nil {
          os.Exit(1)
        } else {
          os.Exit(0)
        }])],

      # Did it compile? Then try to link it.

      [_AC_LINK_GOLIB([$1], [$2], [$4], [$5])],

      # Otherwise report an error.

      [AC_MSG_RESULT([no])
       m4_default([$5], AC_MSG_ERROR([Go library ($2) not found.]))])],

    # If there was no SYMBOL argument provided to this macro, take that to mean
    # this library needs to be imported but won't be referenced, so craft a test
    # that exercises that kind of import clause (i.e., one with the `_'
    # modifier).

    [AC_COMPILE_IFELSE([AC_LANG_PROGRAM([import ("os"; _ "$2")],
        [os.Exit(0)])],
      [_AC_LINK_GOLIB([$1], [$2], [$4], [$5])],
      [AC_MSG_RESULT([no])
       m4_default([$5], AC_MSG_ERROR([Go library ($2) not found.]))])])

  # Restore the original GOFLAGS.

  GOFLAGS="$ac_check_golib_save_GOFLAGS"  
])

configure.ac

Food for Autoconf. Note the call to `AC_CONFIG_MACRO_DIR' to make golib.m4 visible.

AC_INIT([My Cool Go Program], [0.1], [me@example.com], [myprog], []) AC_CONFIG_MACRO_DIR([m4]) AC_CONFIG_SRCDIR([src/mypackage/myproject/myprog.go]) AM_INIT_AUTOMAKE(1.6) AC_LANG_GO AC_PROG_GO AC_CHECK_GOLIB([MARTINI], [github.com/go-martini/martini], [martini.Classic]) AC_CHECK_GOLIB([GORM], [github.com/jinzhu/gorm], [gorm.Open]) AC_CHECK_GOLIB([RENDER], [github.com/martini-contrib/render], [render.Renderer]) AC_CHECK_GOLIB([MYSQL], [github.com/go-sql-driver/mysql]) AC_CONFIG_FILES([Makefile]) AC_OUTPUT

Makefile.am

Food for Automake. From what I can tell, Automake has no explicit support for building Go programs, but it does include general support for defining build steps for arbitrary source languages. Note the use of the ".go.o" suffix declaration to specify the compilation steps for .go source files and the "LINK" variable definition to specify a custom link step. The `FOO_GOFLAGS' and `FOO_LIBS' variables are created by the expansion of `AC_CHECK_GOLIB([FOO]...)' in configure.ac.



bin_PROGRAMS = myprog

myprog_SOURCES = src/mypackage/myproject/myprog.go
myprog_userd_GOFLAGS = $(GOFLAGS) @MARTINI_GOFLAGS@ @RENDER_GOFLAGS@ \
        @GORM_GOFLAGS@ @MYSQL_GOFLAGS@

myprog_DEPENDENCIES = builddir
myprog_LDADD = @MARTINI_LIBS@ @RENDER_LIBS@ @GORM_LIBS@ @MYSQL_LIBS@
myprog_LINK = $(GOC) $(GOFLAGS) -o bin/$(@F)

builddir:
        if [ ! -d bin ]; then mkdir bin; fi

.go.o: 
        $(GOC) -c $(myprog_GOFLAGS) -o $@ $<

CLEANFILES = bin/myprog

Here's a snippet from the output of configure when the integration is working:

checking for gccgo... gccgo checking whether the Go compiler works... yes checking for Go compiler default output file name... a.out checking for suffix of executables... checking whether we are cross compiling... no checking for suffix of object files... o checking for go... /usr/bin/go checking for Go library github.com/go-martini/martini... yes checking for Go library github.com/jinzhu/gorm... yes checking for Go library github.com/martini-contrib/render... yes checking for Go library github.com/go-sql-driver/mysql... yes

To apply this code to your own project, copy / adapt the snippets above along with your own Go code into the following directory layout.

configure.ac Makefile.am m4/golib.m4 src/[package]/[project]/*.go bin/

If you add this project root to your GOPATH, you should be able to run `go build package/project' in addition to `./configure && make'.

Problems? Let me know.

16 Feb 2015 (updated 16 Feb 2015 at 18:11 UTC) »

Transactional B-trees

The next version of gzochi is going to include a new storage engine implementation that keeps data entirely in memory while providing transactional guarantees around atomicity of operations and visibility of data. There are a couple of motivations for this feature. The primary reason is to prepare the architecture for multi-node operation, which, as per Blackman and Waldo's technical report on Project Darkstar, requires that the game server -- which becomes, in a distributed configuration, more like a task execution server -- maintain a transient cache of game data delivered from a remote, durable store of record. The other is to offer an easier mechanism for "quick-starting" a new gzochi installation, and to support users who, for political or operational reasons, prefer not to bundle or install any of the third-party databases that support the persistent storage engines.

That first motivation wouldn't bias me much in either direction on the build-vs-buy question; Project Darkstar's authors almost certainly (planned) to implement this using Berkeley DB's "private" mode (example here). However, gzochi is intentionally agnostic when it comes to storage technology. The database that underlies a storage engine implementation needs only to support serializably isolated cursors and reasonable guarantees around durability; requiring purely in-memory operation would be a heavy requirement. And I feel too ambivalent about Oracle to pin the architecture to what BDB supports, AGPL or no. (The Darkstar architects should have been a bit warier themselves.) So I settled on the "build" side of the balance. ...Although my first move was to look for some source code to steal. And I came up weirdly short. The following is a list of the interesting bits and dead ends I came across while searching for transaction B-tree implementations.

Some more specific requirements: There are two popular flavors of concurrency for the kind of data structure I wanted to build with the serializable level of transactional isolation I wanted to provide. Pessimistic locking requires that all access to the structural or data content of the tree by different agents (threads, transactions) be done under the protection of an explicit read or write lock, depending on the nature of the access. Optimistic locking often comes in the form of multi-version concurrency control, and offers each agent a mutable snapshot of the data over the lifetime of a transaction, mostly brokering resolutions to conflicts only at commit time. Each approach has its advantages: MVCC transactions never wait for locks, which usually makes them faster. Pessimistic locking implementations typically detect conflicting access patterns earlier than optimistic implementations, which wait until commit to do so. Because gzochi transactions are short and fine-grained, and the user experience is sensitive to latency, I believe that the time wasted by unnecessary execution of "doomed" transactional code is better avoided via pessimistic locking. (I could be wrong.)

Here's what I found:

Apache Mavibot - Transactional B-tree implementation in Java using MVCC. Supports persistent and in-memory storage. Hard for me to tell from reading their source code how their in-memory mode could possibly work for multi-operation transactions.
Masstree - Optimistically concurrent, in-memory non-transactional B+tree implementation designed to better exploit SMP hardware.
Silo - Optimistically concurrent, in-memory transactional store that uses Masstree as its B-tree implementation.
SQLite - Lightweight SQL implementation with in-memory options, with a transaction-capable B-tree as the underlying storage. Their source code is readable, but the model of threads, connections, transactions, and what they call "shared cache" is hard to puzzle out. The B-tree accumulates cruft without explicit vacuuming. The B-tree code is enormous.
eXtremeDB - Commercial in-memory database with lots of interesting properties (pessimistic and MVCC modes, claimed latencies extremely low) but, you know, no source code. So.

Because I was unable to find any pessimistic implementations with readily stealable source code, I struck out on my own. It took me about a week to build my own pessimistic B+tree, using Berkeley DB's code and architecture as a surprisingly helpful guide. My version is significantly slower than BDB (with persistence to disk enabled) but I'm going to tune it and tweak it and hopefully get it to holler if not scream.

23 Aug 2014 »

gzochi

gzochi 0.7 is out. Get it here.

This was a tough release to get out, in no small part because I'd decided to provide a suite of data manipulation tools to support the practical administration of a gzochid server. I wanted to make it easy to export and import data, for the purposes of backup and restore, and to perform large-scale schema transformations of serialized data, to support changes to data structures that occur as part of the evolution of a game application.

The first two were (relatively) easy. I looked for prior art and found some in the utilities that ship with Berkley DB, the most mature (and as of last year, the most appealingly licensed) of the storage engines the server supports, db_dump and db_load. It was pretty easy to make some higher-level ones that do the same thing (read and write database rows in a portable format) in terms of gzochid's abstract storage API: gzochi-dump and gzochi-load.

Migrating data is a different story. It's not enough to process the stream of rows as they're emitted from the object store. The objects that make up the state of a game form a graph, so you need to traverse that graph, marking nodes as they're visited. It also means that migrations must be done mostly in-place; fully transforming any single reachable node in the graph may require adding new nodes, removing existing ones, or changing the structure of other parts of the graph. Furthermore, the nature of the transformation to be done complicates the process: Each row of data is effectively a plain byte array, the only metadata being a prefix that provides a key into the application's registry of types; the rest of the data has no structure outside of what the application's serialization code applies to it. To transform the data but preserve its "type," two different serializers must be used -- one to read the original data in, the other to write the transformed data back out. This is indeed a problem RedDwarf Server faced because of its reliance on Java's built-in object serialization mechanism. Some rather pointed discussion of the problem can be found in this forum thread. Someone on that same thread mentions ICE and its sub-project, Freeze, which apparently solves a similar problem.

I did a little reading on these technologies, and while they didn't turn out to be -- nor did I expect them to be -- a fit for my needs, they got me thinking about how migrating a data set like this one is a complex enough operation that it might need some first-class modeling, beyond just defining the logic of the transformation. The solution I wound up with involves XML-based descriptions of input and output (read: deserialization and serialization) configurations and provisioning real storage for the state of the migration itself. Doesn't feel like too much; hope it's enough.

10 Apr 2014 »

I've been working on game-related stuff, time permitting. I'm at a point where I can roughly synchronize the movement of a little naked guy walking around a green field (thanks, Liberated Pixel Cup!) between the server and connected clients, and I wanted to add some spatial occlusion to the mix: Areas of the map that both the client and the server understand to be blocked. I knew this wasn't a trivial problem to solve efficiently, so I started doing research on spatial indexing, and found out about...

R-trees

An R-tree is a container structure for rectangles and associated user data. You search the tree by specifying a target rectangle and a visitor function that gets called for every rectangle in the tree that overlaps your target. Like all tree-based structure, the advantage you get when searching an R-tree derives from the use of branches to hierarchically partition the search space. R-trees use intermediate, covering rectangles to recursively group clusters of spatially-related rectangles. If your target rectangle overlaps a given covering rectangle, it may also overlap one of its covered leaf rectangles; if it doesn't overlap that rectangle, you can safely prune that branch from the search. The secret sauce of a particular R-tree implementation is in the rebalancing algorithm, which generates these covering nodes. A common approach seems to be to iteratively generate some number of covering rectangles that partition their underlying set of constituent rectangles as evenly as possible while minimizing the overlap of the covering set.

I whipped up a couple of implementations -- one in C with GLib dependencies, one in Scheme in terms of gzochi managed records -- based on my reading of the source code by Melinda Green, available here.

r6rs-protobuf

My own usage of this library uncovered another embarrassing issue: Deserializing a message with an embedded message field in r6rs-protobuf 0.6 doesn't work reliably, on account of the way the Protocol Buffers wire protocol directs the deserializer to handle what it perceives as unknown fields (throw 'em away). The solution is that you have to tell a delegate message deserializer exactly how much of a stream it's allowed to read, either explicitly (by passing the delimited length) or by preemptively consuming those bytes and wrapping an in-memory port around them -- which is what I did, to get a patch out as quickly as possible. Find version 0.7 here, if you need it.

14 Jan 2014 (updated 25 Jan 2014 at 13:18 UTC) »

gzochi

Happy new year, everyone. I've just released gzochi version 0.5. Get it here!

As part of the fixes that went into this version, I made several adjustments to the error-handling behavior of the data storage layer, mostly to enable better communication about the result of a query to the database and the ensuing changes in transaction state. Prior to this, I'd taken the approach that any "failure" in data access -- such as a transaction deadlock -- could be indicated by the return value of the data access function, in part to smooth out differences in API between the various storage engines that gzochi supports. This had the fairly obvious disadvantage that it was impossible to tell the difference between a lookup for a non-existent key and an error (well, I figured, application code just shouldn't be doing that), and also that there was no way at all to indicate an error for functions that return void; and so after tracking down and fixing enough hard-to-fix bugs, I decided to fix the behavior of this API. In C, your options for returning multiple bits of information to a caller or distinguishing between, say, an "empty" response and an error are limited. You can make the type of data you return more complex by wrapping it inside of a data structure that also includes metadata about the invocation; or you can pass pointers to "outvalues" as arguments to your function, and have the callee modify them to indicate errors or other out-of-band responses. I like this latter approach because it allows you to preserve for the most part the intended interface between the function and its caller. You can, after all, allow people to pass NULL for those outvalues if they don't care about anything besides the return value. It does require, however, that your function reliably handle the intricacies of checking whether the outvalues are non-NULL and possibly allocating storage for them. GLib's GError type and its associated helper functions and macros are very convenient here. Pass a GError ** to your function and use g_set_error to set it conditionally.

The problem I was trying to solve with the foolishness above still exists, though: gzochi still supports several different storage engines (or, at least, it will -- support for GDBM was removed in this release; future versions will support HamsterDB) and each supported database has its own set of error codes and ways of returning them. So I created a kind of error code independence layer to convert implementation-specific values to values that are part of that layer. For example, in BerkeleyDB:

DB_LOCK_DEADLOCK is translated to GZOCHID_STORAGE_ETXFAILURE
DB_NOTFOUND is translated to GZOCHID_STORAGE_ENOTFOUND

Back to the release: There's also a new "managed" data structure that employs the principles behind Project Darkstar's ScalableList and provides SRFI-44 collection semantics (one of the more contentious SRFI discussions I've read); and enhancements to the web monitoring console. Like I said, go get it!

24 Sep 2013 »

Clutter

Further adventures in game development: I've been working with Clutter to create a primitive client-side game engine to integrate with gzochi. Clutter's a 2-D scene graph library for writing OpenGL applications in C. It's the canvas library behind Mutter and, from what I can tell, a bunch of next-level GTK+ stuff. The documentation says it's for building "visually rich graphical user interfaces," but I don't see why you couldn't also use it to, say, manage the layout of a game screen, which is what I'm trying to use it for: There are a lot of things I don't miss from my brief career as a professional Flash developer, but computing dirty rectagles and figuring out the draw order for actors in a scene isn't something I was looking forward to writing myself. Finding Clutter felt serendipitous.

One thing Clutter doesn't come with, though, is a component that can render frame-based animations, which I'm modeling as sets of raster images that get painted on the stage in sequence at a specified rate. I've built some components that can load image atlases from disk or over the web and carve them up into frames in memory, and I've been trying to get Clutter to turn those frames into a flipbook. It took a lot of meditating over the documentation and some coaching from Clutter developers on IRC for me to get that doing animations this way wasn't going to be a misuse of the library -- but it would be something I'd have to write myself.

Clutter does ship with some data structures you can use for displaying images, so that's what I tried first. There are two: ClutterTexture, which is a GObject subtype of ClutterActor; and ClutterImage, which implements the ClutterContent interface and can thus be used to render ClutterActors that don't know how to render themselves. I saw some indication online that ClutterTexture was probably on its way out, so the first technique I used to implement my animation system was to have a handler for the "new-frame" signal on an animation-specific ClutterTimeline update the internal state of the animated actor's ClutterImage content with the pixel data for a new frame (via clutter_image_set_data). To my surprise, it actually worked (go Clutter!) but it also drove the CPU utilization on my netbook up to about 60%. Here's why: Since Clutter gets its drawing done via OpenGL (actually via Cogl, a GL/GLES compatibility layer), before you can display an image to the framebuffer, Clutter wants to get a handle to a CoglTexture, a block of allocated (and filled-out) GPU texture memory. Pushing textures to the GPU is expensive, so doing it every frame is pretty much a non-starter.

The next thing I tried was maintaining a roster of ClutterImage objects, each with static, pre-uploaded data for a single frame of animation, and updating the content of my sprite actor with the requisite image (via clutter_actor_set_content) on every new frame. This was an improvement, but only got me down to about 40% utilization. I think the problem with this approach might have been that updating the content at the actor level triggers an invalidation that forces some unnecessary invalidations of the scene graph that are expensive to recalculate, but I didn't investigate deeply.

At the suggestion of someone on IRC, the next thing I tried was creating my own implementation of the ClutterContent interface such that the paint_content function would use a pre-uploaded Cogl texture to paint each required frame. The paint_content function has the prototype:

void (*) (ClutterContent *, ClutterActor *, ClutterPaintNode *);

...where ClutterPaintNode argument is a local root of the render graph that corresponds to the actor to which the content is attached (content can be shared across multiple actors). To get your content painted, you need to attach a child paint node to that root. The ClutterPaintNode interface doesn't expose any primitive drawing callbacks, so I don't think they want you to write your own, but there's a provided implementation (ClutterTexturePaintNode) that paints the contents of the texture you give it. I built my ClutterContent implementation around this paint node type -- and it worked, but I was still at around 30% CPU. When I profiled my application, I found that the invalidation signal was being fired much more frequently than the new-frame signal, and the default mechanism for handling invalidation -- which I wasn't sure I wanted to override -- queues a paint operation.

I was feeling a bit down in the mouth about the prospects of animating my sprite at 20% CPU or less. I started Googling various combinations of "clutter" and "animation," and somehow arrived at the Gitorious page for Rob Staudinger's clutter-sprite project, which promises obliquely to provide "A sprite actor for clutter." It didn't look like much, but I started rifling the source files to see if I could learn something. Sure enough, I found that Rob cleverly overrides ClutterActor's paint function and uses cogl_rectangle to paint the frame of animation directly from an image stored as a CoglMaterial, skirting the Clutter render tree entirely. Using a variation on his technique, I was able to get my utilization down to about 25%, which seems like it might be as low as I'm going to get with my netbook's GPU and this version of Clutter.

14 Jul 2013 »

gzochi

Having at least temporarily put that security fuss to bed, I decided to implement a few of the remaining missing features of the gzochi server container; specifically, periodic task rescheduling and some light application metrics reporting. The result is a fresh release, 0.4. It's been sync'd to Savannah's trusty network of mirrors by now -- so go get it! Software's never, you know, done, but I think with this release the framework might be feature-complete, at least in terms of its v1 functionality. (A "distributed mode" might be in the works for v2, if I ever get around to reading how RedDwarf Server went about that.) So what's left to work on? Bugs, for sure. In particular, there are several worrisome memory leaks. Unit test coverage. Performance, because there's no reason the container shouldn't really scream, especially with recent GNU Guile builds.

But wow: the rare joy of getting to be a user.

11 Jun 2013 »

Like a lot of programmers, I think, I developed a mode of thinking about and designing a software system as a set of mostly independent components, each with a limited, discrete function, working in concert to produce a complex epiphenomenal behavior. Until relatively recently, though, I didn't think of these systems as potentially spanning multiple processes or machines. It may seem like a trivial observation, but I've come to find it useful to think of complex systems as appliances that use some set of computing hardware to host one or more processes whose combined behavior forms the behavior of the whole system. The benefit of this kind of thinking is that you no longer need to figure out a reasonable way to wedge a web server into, say, your spreadsheet application process code. Instead, you've got your web server, and you've got your spreadsheet. The difficulty is that you may need to launch and coordinate several processes -- or machines -- to get the complete appliance into the right state, such that its different parts are relaying data back and forth and responding to requests properly.

krb5

...Which brings me to my plodding, ongoing experiments in writing an online game. I'd invested quite a bit of time attempting to model the concept of downloadable assets of different types from within my gzochi application code before ultimately deciding that the game server had no business manipulating asset data. That kind of thing, I figure, is the rightful purview of some kind of independent asset management system that's aware of user authorization but not necessarily game state. So I set about figuring out how to manage authorization across processes, and, naturally, Kerberos came to mind. Everything you read about Kerberos steers you in the direction of using it via GSS, the Generic Security Services API. A lot of what you read about GSS suggests that perhaps you ought to consider using SASL, the Simple Authentication & Security Layer. So I did. On first glance, SASL looked like a bad fit -- your SASL-ized applications get to enter into negotiations over which of a set of mutually-supported authentication mechanisms will be used to initiate a session. I guess the idea is that you want secure authentication and you don't care how it happens. But I did care how it happened. So I dropped down to GSS, and found that at first it sort of made sense: Everything is a principal and has credentials, and two principals can create a security context with each other through which they can securely exchange information. But the GSS API designers seemed desperate to avoid explicit representation of anything that might remotely suggest that it's a wrapper around any particular security implementation, much less Kerberos -- no ticket-granting tickets, no service tickets, no distinction between user and service principals. I spent weeks trying to figure out how to model the authentication and authorization flow I had in mind: A client application would obtain a TGT for user with a password, and then use it to obtain tickets to authenticate with the asset server and game server.

When, out of frustration, I dug into the verboten krb5 API, I found it easy to understand -- in the course of trying to get GSS to work I'd figured out the details of key tables and credential caches -- and had something approximating my desired architecture working in an evening. And it's, like, ten lines of code. So I'm kind of on board with what Simon Josefsson says in the appendix of the GNU GSS manual:

...GSS may not be the simplest solution available to solve actual problems, since otherwise more projects would have chosen to take advantage of the work that went into GSS instead of using another framework (or designing their own solution).

I'm with him right up until he says the only circumstance under which you should use GSS is when you're sure you want a Kerberos 5 implementation. Bzzzht!

80 older entries...