or: The Mylex saga
On Wednesday, Jan 16th freshmeat suffered from a major database server outage. Both quad CPU database servers crashed hard, one after the other. As we knew we had a swap problem on these machines anyway, we upgraded them from Red Hat 7.1 to 7.2. Since 7.2 shipped with a 2.4.7 kernel we were trying to upgrade it to 2.4.9 from the Red Hat updates repository. Sad but true, this kernel didn't like our DAC960 Mylex controller and panic'ed on boot. The 2.4.7 kernel didn't survive the load MySQL was putting on it so freshmeat went down for 6 hours straight.
When we finally got it up and running again, it was in a very flakey state, since we had to disable searches to make the machines survive with kernel 2.4.7 as we didn't find a single kernel besides that one from the 7.2 install CDs that would boot on these machines. Going back to 2.2 wasn't an option either, since there's a 2GB memory limit and the machines are equipped with 4GB each.
Repeated attempts to self-compile kernels failed also, even with an updated DAC960 driver we found on the net. Therefore we decided to replace the quad machines by dual CPU machines without that dubious DAC960 controller.
After two days of testing with three dual CPU machines with 2GB memory each, we decided to switch the site to exclusively use these machines the coming day.
When I woke up that day I noticed a security advisory from Red Hat in my inbox with a cute little note appended to it saying "* Updated DAC960 driver". I contacted OSDN's netop staff to talk them into another kernel upgrade attempt with the quad machines, which happened just yesterday. And.. who would've guessed, they're booting!
We threw away the dual CPU plans and brought the site back up on the quad machines after resynching the replicated databases and that whole mess seems to have come to an end now. *phew*.
Kudos go out to Karl and Yazz and the rest of OSDNs netop staff for spending several days in the OSDN cage at Exodus (days without a fleece in a wind tunnel like environment!) without sleep, without food, without TV, only with my poor self trying to keep them awake. Good job!