freshmeat's quad database servers finally booting
or: The Mylex saga
On Wednesday, Jan 16th freshmeat suffered from a
major database server outage.
Both quad CPU database servers crashed hard, one after the
other. As we knew we had a swap problem on these machines
anyway, we upgraded them from Red Hat 7.1 to 7.2.
Since 7.2 shipped with a 2.4.7 kernel we were trying to
upgrade it to 2.4.9 from the Red Hat updates repository.
Sad but true, this kernel didn't like our DAC960 Mylex
controller and panic'ed on boot. The 2.4.7 kernel didn't
survive the load MySQL was putting on it so freshmeat went
down for 6 hours straight.
When we finally got it up and running again, it was in
a very flakey state, since we had to disable searches to
make the machines survive with kernel 2.4.7 as we didn't
find a single kernel besides that one from the 7.2 install
CDs that would boot on these machines. Going back to 2.2
wasn't an option either, since there's a 2GB memory limit
and the machines are equipped with 4GB each.
Repeated attempts to self-compile kernels failed also,
even with an updated DAC960 driver we found on the net.
Therefore we decided to replace the quad machines by dual
CPU machines without that dubious DAC960 controller.
After two days of testing with three dual CPU machines
with 2GB memory each, we decided to switch the site to
exclusively use these machines the coming day.
When I woke up that day I noticed a security
advisory from Red Hat in my inbox with a cute little
note appended to it saying "* Updated DAC960 driver". I
contacted OSDN's netop
staff to talk them into another kernel upgrade attempt
with the quad machines, which happened just yesterday.
And.. who would've guessed, they're booting!
We threw away the dual CPU plans and brought the site
back up on the quad machines after resynching the
replicated databases and that whole mess seems to have
come to an end now. *phew*.
Kudos go out to Karl and Yazz and the rest of OSDNs
netop staff for spending several days in the OSDN cage at
Exodus (days without a fleece in a wind tunnel like
environment!) without sleep, without food, without TV,
only with my poor self trying to keep them awake. Good job!