More on the aftermath of the fire. But first, a bit of
history of the machine. It was an older system that had
been through many generations of motherboard, CPU, and
drive upgrades. For a while now, it's had an Asus P2B (or perhaps P2B-F) slot 1 motherboard with a Celeron 766 on a slot 1 adapter, 1GiB of PC133 SDRAM, and a Maxtor 160GiB 5400 RPM drive mounted in one of the 5.25-inch drive bays with a bezel fan assembly. It's been running Red Hat 9.
A few months ago, I tried to replace the single 160 GiB drive with a 3ware 7500-4 ATA RAID controller and four Maxtor 200GiB 7200 RPM drives. I've been very happy with 3ware controllers on other systems, but I wasn't able to get it to work in this system. I didn't really resolve why it didn't work, but I concluded that this particular motherboard, which had served me faithfully for quite a while, had finally passed its best-when-used-by date. I wasn't too concerned, as I expected to replace the motherboard and CPU in the not-too-distant future, probably with a dual Athlon motherboard such as an Asus A7M266D. The server doesn't really need a dual Athlon, but I've been very happy with the A7M266D in my desktop system at home, and it's one of the few Athlon motherboards that supports ECC and 64-bit PCI.
Anyhow, as things have turned out, I'm glad that I didn't put the RAID in the box.
More recently, I assembled a new computer to use for editing the commercials from MPEG-2 program streams recorded by my ReplayTV 4080. Unfortunately I was not able to find any MPEG-2 editing software for Linux, so this machine intended to run Linux and the Womble editor. (Maybe there's some small chance of Womble running with Wine, but I haven't tried it yet.) The machine has an Asus A7V8X motherboard and Athlon XP 2500+ CPU.
In order to come up with a replacement server (and more immediately, an interim machine for data recovery), I removed the disk from the video editing system, installed one of the 200GiB Maxtor drives originally intended for the RAID, and installed Red Hat 9.
I loaded up the new system, an old LCD monitor, USB keyboard and trackball, power cords, a spare Ultra ATA cable, etc. into the car and headed to my friend Steve's house. Steve had graciously aggreed to help out with the recovery effort.
The old server's case was the type with an inverted U shaped metal top rather than separate left and right side panels. A keyboard had been left on top of the machine, and was melted to the point that it was unrecognizable. It was just a big lump of blackened plastic stuck to the top and dripping down the sides of the case. And all the plastics on the front of the case, the CD-ROM drive, and the bezel fan had also melted and run. Steve removed the screws and we pried the case open with a screwdriver. Some drive power cables had been in contact with the side of the case and the insulation had melted and attached itself to the side, but we pried that loose.
The inside of the computer was bad, but perhaps not as bad as I'd feared. Everything was scorched and covered with soot. But nothing inside the case had actually caught fire. Painted surfaces were bubbled. Stuff toward the bottom of the case fared better than stuff
near the top, probably partly due to heat rising, and partly due to the
top of the case having been covered with the burning keyboard.
Naturally, the hard drive was near the top of the case. The top of the
hard drive (the label side) was blackened, and the drive was covered on
all sides with soot. But there was no obvious physical damage. The top
cover didn't appear to be warped. The seals appeared to a casual glance
to be intact. There didn't appear to be any circuit board components missing
or visibly damaged, although the board may be slightly scorched.
Steve got a rag and started brushing the soot from the drive. He decided
that it would work better with a little water. I later realized that applying
any water was a really bad idea, because the soot becomes acidic when wet.
But at the time I wasn't thinking about that. It only occurred to me that
he shouldn't scrub the drive too much because it might damage seals that were
otherwise intact. So he stopped doing that.
Steve thought maybe the drive didn't look so bad. He thought that there
was a 50/50 chance that we'd be able to read it.
I unplugged the secondary ATA cable from the CD-RW and DVD-RW drives, and
plugged in the burned drive. I hooked up a drive power cable. I told Steve
that I'd cross my fingers except that I know it's bad luck to be superstitious.
I turned on the system.
The drive made normal spinup and recal sounds. There weren't any screeching
or thunking sounds that I would have expected if there was serious mechanical
damage. Of course, the spindle bearings may have been damaged by the intense
heat, so it's possible that it could have mechanical problems as time goes on.
Thus it was important to try to retrieve the data as soon as possible. Even
if the drive was initially fully working, we couldn't count on being able to
read all the data even once, let alone have a second chance at it.
From a shell window, I verified that /proc/ide/hdc/model had the correct
drive identification. So far so good. Next I tried "fdisk -l /dev/hdc" to
list the parition table. No go, it reported "device not found". Hmmm...
I tried to dd some blocks from the drive. "device not found" again. The
drive didn't seem willing to cough up any data from the platters no matter
what I tried, though it was still sitting there quietly spinning away.
I tried rebooting the machine. The BIOS recognized the drive and reported
the correct model and capacity. Linux booted again, and the exact same
behavior was seen. I was concerned. Was the drive electronics not working
correctly? Was I going to have to send it to a data recovery service after
all?
I piped dmesg into more to see what the kernel had reported during startup.
There were some complaints from the ide-scsi driver. Aha! During the hasty
Linux installation, Red Hat 9 had noticed the DVD-RW and CD-RW as the master
and slave devices on the secondary ATA controller, and had conveniently
put "hdc=ide-scsi hdd=ide-scsi" in the kernel command line in the GRUB
configuration. This is very desirable for CD and DVD burners, but not at
all useful for disk drives. I edited the grub.conf and rebooted.
Success! Now fdisk displayed the partition table, which appeared correct.
I quickly created mount points for the partitions of the burned drive,
mounted the partitions read-only, cd'd to the most important partition,
and did a "tar -cvlf /old/homer2.tar". The file name is not a Simpsons
reference, but rather a shorthand for the mount point, which was
/home/ruckus2. /old is a 190 GiB partition on the new drive.
Old, familiar file names started scrolling down the screen. Things were
looking up!
Steve pulled the four 256 MiB DIMMs and the CPU from the system. They're
covered with soot but show no other signs of damage. Steve pointed out that
they may well still work just fine, but since they've been stressed beyond
the maximum rated operating conditions, it would be foolish to trust them.
I might list them on eBay; I've never had a fire sale until now. Don't worry, I'm not going to misrepresent them. I doubt anyone will want to
buy them, but on eBay you never know...
The most imporant three partitions stored around 4, 8, and 80 GiB. The
first two didn't take too long, but I expected the third to take quite a
while, and it did. In the mean time we ate dinner and played a game of
Puerto Rico, in which I trounced Steve and his wife. I got 64 victory
points, more than I've ever gotten before. If I hadn't made a mistake in
the last round, I would have gotten another 7 VPs, but 64 was quite
sufficient.
The operation completed with no errors. I moved on to the less important
paritions, such as /usr and /var., and we watched a movie. After the movie,
we returned to the garage, where the system had completed the last tar.
I unmounted the partitions and shut the system down. I disconnected the
old drive, reconnected the CD-RW and DVD-RW, closed up the case, and put
the system back in my car.
Steve put the top case back on the old machine (sans screws), and we
rebagged it. This time we taped the bags closed. I don't think there's
any reason I shouldn't just throw it away, but I'm holding off on that
for a few days.
Another friend pointed out to me afterward that the next time I start to
complain about always having bad luck, I should look back at this. On the
other hand, if there really is anything to luck aside from sheer randomness,
maybe today I used up my next five years' allotment of good luck. Who
knows? I'm just glad to have gotten all the data safely extracted.
I don't know that Maxtor drives are particularly better engineered to
withstand temperature extremes than any other brand of drive. But I'm
impressed that it did so well. I've had good luck with Maxtor drives for
many years, and only had to take advantage of their No Quibble Service(tm) once, back in 1995, for a drive in a very badly designed Compaq
Pressario where the drive got absolutely *no* airflow. All of my own
systems have been assembled with reasonable attention to keeping the drives
cool, and I've never had a problem. Maxtor will continue to get my
business for the forseeable future. I think I'll write them a letter and
send them photos.
Speaking of which, I'm sorry that I can't make any photos available online
at the moment. I'll try to do that in the near future. Right now I'm still
trying to plan the next phase of the disaster recovery, which is getting a
new colocation plan.
Needless to say, disaster recovery would be much easier, and much more
dependable, if one had a plan in place ahead of time, and did routine backups.
I should have known better.