Found an SMP race in the IDE code of the block patch (pretty stupid) and a more subtle one in the SCSI mid level. I've been doing benchmarks with the new queue stuff to get a feel for performance. A decent RAID setup with a couple of SCSI disks would do nicely here...
Worked with davem to improve the elevator in 2.3. David has a nice description of the problem on his page, but a quick recap is that the elevator will not coalesce adjacant buffers if it thinks it will hurt interactiveness. Instead a new request is grabbed and the buffer added to that. While interactiveness is a must for a desktop machine, this hurt I/O performance quite badly.
Good news is that the loop back driver works with my queuing changes. Other good news is that I'm currently seeing 14% performance increase with my queueing changes - and that is on a single disk. Multiple disk I/O should benefit even more.