Tried various elevator settings and profiled a dbench 48 run. Depending on settings, the raw dump from a single dbench run takes between 13-16MB of disk space! Getting a detailed ASCII dumped consumed as much as 60MB of disk space. The app also prints a useful one page summary of I/O activity, which is what I've been using. I haven't had much time to investigate the logs yet, but it looks as if the write bomb logic is what is hurting performance the most. Especially because the bomb segments are set as low as 4 sectors! Much better results are achieved with a bomb segment of 64 and a new max_read_pending member to the elevator so that we don't unconditionally reduce segments for a single pending read. I will put up detailed info tomorrow along with a possible good setup for the elevator.
The request accounter does not seem to impose such a big an overhead as I had expected. I keep an in-kernel 2MB ring buffer which can hold 87381 entries. The blklog app reads 128 entries at the time and writes them out in intervals of 1MB. A dbench run consumes about 500,000-650,000 requests and the miss rate is about 0.02% which is tolerable. This gives dbench about a 3% performance hit. If blklog is not running (it enables logging when it opens /dev/blklog), the performance hit is the overhead of a function call per request - which doesn't even show up in performance testing.