A kernel ghost story...
I just discovered a real nasty thing with grub, raid and RedHat 7.3. It goes like this:
I just spent the last 2 days building a server. I have RedHat 7.3 as the os, 2 ide drives running software raid 1, and these are using ext3 as the filesystem.
7.3 has been out a while and a lot of the packages are out of date, so I do a update all RPMs to bring the server up to spec. This includes updating the kernel to 2.4.18-5. It shouldn't be a problem.
I check /boot, and it's all been updated to the correct kernel. I check grub, and these files are pointing to vmlinuz 2.4.18-5. I'm happy and I reboot.
Grub loads. It prompts with "Red Hat Linux (2.4.18-3)". I think "What the...?" I try booting it to see what happens, expecting obscure error codes, kernel panics, and the end of civilisation as we know it.
It starts booting, then it loads vmlinuz-2.4.18-3, alogn with the rest of the OS. Of course all the device drivers fail because the symbols are wrong. I'm amazed, astounded and astonished. I search for signs of 2.4.18-3. It's not there. Totally absent. Not a 2.4.18-3 anywhere on the server. Totally eradicated. I check two different ways just to be sure. I am totally confused, confounded and chastened. The server just semi-successfully booted a ghost; I had thought I felt a chill enter the room. I wrote that off to the AC coming on, maybe it hadn't. I think "Buh?"
I search the internet for answers - it knows everything - this has to have happened before. This will be fixed in two minutes. No problem.
I find lots of questions relating to this. No answers. Nothing. Not a clue. Not a hint.
This is a problem.
I wonder if maybe civilisation did end when I booted a kernel that didn't exist and could only have been a ghost. I idly wonder if Dr. Egon Spengler would have any ideas. He seems like the type that would know about a ghost kernel.
I assume grub is storing the config file elsewhere. It doesn't explain how it's loading the previous kernel though; it can't be storing that much data in the MBR. I try running grub-install. It won't run because it says "/dev/md0 does not have any BIOS drive". Of course not. I try forcing it to do something. Anything. Because I have everything on a raid device it refuses. I try everything short of trying to find a sharp stick and poking it, but it just won't work. By now I'm looking at losing 2 days of a server install because I have no way to get all my config files off the machine (no devices are working, remember?). By now, I'm beginning to drool and idly wondering if it will stain my shirt.
I decide that the system is pretty irreprable, and elect to try desperate measures.
init 1
...
umount /dev/md0
mount -t ext3 /dev/hda1 /tmp
cd /tmp/boot
ls
Suddenly I see the old boot directory. vmlinuz-2.4.18-3 is there, along with the old grub files that it won't change. I wipe the drool off my chin, and type the following:
mkdir old
mv * old
cp -r /boot/* .
ls
All new files are there, I double check permissions and links. I type: sync; init 6
Grub appears on the screen. It happily prompts me to boot "Red Hat Linux (2.4.18-5)". The new kernel boots and all is right with the world again. I've exorcised the ghost of 2.4.18-3 and I can go back to breaking MySQL again.