I helped set up a 20 machine distcc+ccache compile farm. It's fun seeing the compile time go from 80 minutes to just 20 minutes. The tricky part was modifying all the Makefiles for our product so it will go in parallel. It's a bit like a game of "whack-a-mole" -- setting up the Makefiles to go faster, and then fixing all the missing dependencies that turn up as random build failures. Now that I know what I'm doing, I'll have to see if I can get some other things (eg. Kaffe, gcj) to build on the cluster. It should really speed up some regression testing investigations. :-)
I encountered a bizarre GNU make bug where a particular Makefile I have will drain all the jobserver tokens, and the "make -j20" build will degrade into a serial build. I can reproduce it, but only after running the build for 20 minutes. It's also a "heisenbug", which goes away when I try to add some printf statements. Ugh. It was easy to fix up the Makefile to avoid it, so I'm not sure if I'll ever have enough time to fully debug it and file a bug report/patch.
I finally found some time to start working on the new server for Kaffe.org. It turns out the built-in Intel ICH5R SATA RAID is really just a joke, it's not really hardware RAID at all (I knew this when I bought it). I ended up turning it off in the BIOS, and going with the Linux software RAID in the kernel, which is really quite cool -- this is the first I've ever gotten to play with it (all the other servers I use have hardware RAID). I got so enthused, I wanted to try out RAID5, so I went out and bought a 3rd 250GB hard driver, so now the machine has 0.75TB of storage. The new drive is a Western Digital just like the others, but it's a EIDE drive, whereas the others are SATA. I ran the bonnie++ benchmark, and there wasn't much difference in speed at all. I partitioned each drive into 15 partitions (the max for the SCSI subsystem), and setup about half of them as RAID-5. I set up the other half with RAID-0 (striped) sets -- there's no disk space overhead for parity, and it's theoretically faster, but it has no redundancy, so I'll have downtime if a disk fails. I'm running LVM2 on top of the RAID sets, so I can easily create and move around logical volumes. I also moved the boot partition to a RAID-1 (Mirrored) volume. And I did lots of GRUB magic, all remotely over the network using IPMI2. It's all very cool.
The machine came with Fedora Core 2 -- the x86_64 version. I spent a lot of time removing RPMs so I now just have a minimal set. I think I'll attempt to upgrade that to Fedora Core 3 somehow. I'm not too keen to run their installer though - I really only want a minimal setup. I think I'll also try out Debian's AMD64 port -- however, that hasn't been released yet. The name for their port seems wrong, since I'm running Intel Xeon EM64T chips. I'd somewhat leery of it, since it I don't know what decisions they've made with regards to the porting issues. My loyalties are split - I maintain RPMs all day long at work, but I'm also a Debian "emeritus" developer. It's hard to make any decisions without having hard benchmark numbers for performance. I've got access to some benchmark suites myself (SPEC CPU 2000, SPEC JVM 98), so maybe I'll have to do some experiments.
Anyways, I'm almost ready to set up a separate subnet for the machine, and tunnel it to the existing kaffe.org server, so I can start migrating some services over. :-)
I'm way behind on my other kaffe.org commitments -- eg. getting testing going for the release process, some website reorganization, JIT4 merge, etc. Hopefully, I'll get this server stuff tucked away soon, and I can move on to the fun stuff. :-)