Older blog entries for axboe (starting at number 15)

Thanks to all the folks that wrote to me with suggestions regarding the Quantum hard drive. Unfortunately pricewatch does not have this model, but I then discovered that Quantum's web site still lists it as a current model so it can't be too bad. Tomorrow I think I'll walk from store to store until I find one.

Picked up tickets to my trip - I'll be away from 14/5 to 21/5, which I'm much looking forward to!

Put a 'CD-ROM patch FAQ' on kernel.dk. Hopefully that will save me from a lot of questions. Find it here.

Put up a new block patch with a couple of fixes. SCSI needs to cleanup the queue if it doesn't find a device or we leak requests. Other minor stuff as well. Find it here. Includes the IDE multiple HWIF request changes mentioned a couple of days ago as well.

Who was the ATA genius who thought that making the (Mt Fuji deprecated) GET_MEDIA_STATUS command only half-ATAPI like was a good idea? Fire the packet and then examine IDE registers to get the results, bravo.

Sent Andre some IDE fixes that eliminates the whole ide_doX_request type thing. IDE layer has a per drive queue and the whole #if MAX_HWIFS > X was a bit nasty. All that is now gone, IDE requst handling is IMHO much cleaner and Andre reports "the modifications to the request list jumped performance in a HUGE way" which is always nice.

Talked to alan on IRC - nice to finally find someone who agrees with me that magicdev is quite possibly the most horrible thing that ever happened. So started hacking on a magicdev that does things The Right Way (tm). For Mt Fuji style drives we can use the GET_EVENT / SEND_EVENT interface which is very nice. We can do cool stuff such as automounting on CD insert and safely umount and pop the CD out when the user presses the eject button. For older drives we have to use the 'Microsoft Media Status Notification' which is much more nasty. I'm sure this is going to be really cool :-)

Played with Andrea's classzone vm patch. This works very nicely for me, I/O performance is extremely good and doesn't cause much swap out. According to riel it is fundamentally broken. A good mm solution must show up soon!

Haven't registered for OLS yet, I should get around to doing this soon...

Had dinner with friends (Sandra is going to be really mad if I don't mention her here, so please endulge me) and watched our cats fight.

2.2.15 finally came out, I was starting to think this would never happen :-). Merged the DVD fixes for 2.2.16-pre2 with Alan, now it should be ready to go. It's going to be nice just to tell people to use 2.2.16-pre2 and above for DVD, and not having to bother helping them patch their kernels and answering questions about why distro XXX's kernel won't work with my patches. I get a LOT of those questions.

Found a bug in the IDE layer, where ide_spin_wait_hwgroup would be passed saved flags from a spin_lock_irqsave and restore from there. Apparently this is not news to sparc32 folks, where this breaks horribly. It worked on ia32 hardware, but it just looked so wrong. Jeff Garzik noted that if the function was never called with interrupts disabled, just using spin_lock_irq() is a win wrt speed and stack usage. That made the patch even nicer. Now I just need to push it to Andre.

Went to a friends appartment for dinner (hi Thomas, I know you are reading this! Have a good first day at work).

I believe the erratic performance davem saw is now fixed. Instead of having two request_freelist heads (one for reads and one for writes, where reads can steal from the write list if necessary) I use a single list head to hold the requests and let a counter keep track of when writes should block in get_request_wait. Writes can consume 2/3 of the queue, just like before. This approach has a couple of advantages, although it does not "seem" as clean. We save a bit of space in request_queue_t and struct request and get_request is simpler than before. I'm waiting to hear from davem claiming his free beer, before submitting this.

Removed the BROKEN_CAP_PAGE in ide-cd and let a simple id scan decide whether to include the full mode page cap size or not. Should make all drives happy, old as well as ACER50 and similar.

Started a document detailing the Linux block driver stuff.

Davem is seeing inconsistent results with my new elevator stuff, which is very strange. I haven't been home much today, I'll investigate this matter tomorrow and hopefully be able to offer an explanation of what is going on. Right now I'm puzzled.

And the weather today wasn't really as nice as I expected, compared to the weekend it was kind of cold. A good time was still had, though :-)

Finished and cleaned up the block queueing and elevator changes. Type of elevator is selectable with elevator_init(), blk_init_queue() selects ELEVATOR_DEFAULT for you which is the elevator we have now in 2.3. Only difference is max_bomb_segments is increased to 32 for much better performance. The other "elevator" implemented is called noop, since it always stores incoming at the back and always coalesces. Give it a shove, patch up and change ELEVATOR_DEFAULT to ELEVATOR_NOOP in ll_rw_blk.c. It's in Linus' inbox.

Apparently the DVD stuff is going into 2.2.16-pre2. I've got a couple of changes I need to send to Alan, mostly backports from 2.3 current. Interesting to see how this goes... It's been ages since I've gotten a bug report for 2.2 + dvd patches, so I think we are fine. In addition, 2.3 has had this stuff since 2.3.16 (or there abouts) and seems to be doing great.

Tomorrow is May 1st, which means beer and great weather! And in two weeks I'm going on vacation, life is great.

Never forget a ; it could cost you many hours of restoring data from your partitions due to corruption. Did I mention the missing ; was in the merge code? Oh well, this is the first time that I've experienced (self inflicted) corruption. I could kick myself. Seems forgetting that character was the theme of the day, eh Rik :-)

Modularized the elevator code so it is easy to write a new elevator plugin or just choose which of the available you want for a low level driver. Arjan is playing with some of his own ideas, interesting to see how they turn out.

Didn't do a whole lot more, it is saturday after all.

What do you know, edit a diary entry and the date changes!

After having studied many different types of I/O schedulers, I've come to the conclusion that simple ascending request sorting is the most optimal for most circumstances. It has decent runtime for insertions and good average seek time. Combine that with some stuff for limiting starvation and you got yourself a decent disk elevator - and behold - this is what we have in current 2.3. After having tried the BSD style elevator, I implemented one that always returns the request closest to the one the drive is servicing right now. In terms of runtime it is expensive and the gains over the simple ascending sort was just not worth it. So my work yet again degenerates into just getting good performance with tweaking elevator defaults, how boring. Well almost, I want to modularise the current elevator so that it is possible to select which one you want. For IDE drives the current one is pretty good, for SCSI it does some work that is really unneeded but does not harm performance (well, we do take a small hit because of the unneeded work, but that is neglible). For intelligent devices (highend SCSI HBA/disks, I2O) that claim to do their own elevating, we shouldn't need to do much.

At least one person is having problems caused by changing the default mode page size in ide-cd to the standard specified size. So it looks like we are reverting to old behaviour again. The only known case (to me) that fails with the smaller mode page is the ACER 50 drive, not enough to justify changing the old default.

I want to get a new workstation PC. Contemplating getting a K7, but I'm not sure I really need it. Oh well.

Implemented a BSD style elevator to see what effects that would have on I/O behaviour. The BSD elevator is different in that it keeps two lists of requests that the device must service when it gets unplugged (this is handled automatically by me, though, queue_head always points to the list that needs work). We start by filling requests onto the first list in strict sector ordering until at request comes in that lies before the last active request. Then we switch lists and start adding to the other list. This gives good I/O ordering and also imposes a limit on how long we risk waiting for a specific request to finish. Performance is as-of-yet not quite determined. Feels pretty good though and initial bencmarks show that it is.

Decided to give the nvidia XFree86-4.0 drivers a go. The kernel driver needed porting to 2.3 first, though, but that was fairly trivial. Seems to run well. Soon it is time for the Q3 test to see how well the OpenGL performs! XFree86-3.3 performance with the nvidia glx sucked big time, I truly hope the new one is much better. According to the Linux Games site it is, sweet.

6 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!