25 Sep 2007 johnnyb   » (Journeyer)

Everything Going Wrong at Once
also known as
Shared memory problems on Postgres Bootup

Okay, so here's the deal. I am back in school getting a Master's in Theology. So, on Tuesdays I am at the seminary. I have one class that I need to be at the seminary during the day for - it's from 12:10PM to 1:15PM. So, I just bring all my work there and just work from the seminary for the day, and go to the class at lunch.

Today, at 11:30, my crew told me that one of our servers was very slow. At 11:50, the wireless internet at the seminary went out. At noon, I decided to use the bluetooth networking on my phone. I discover, to my horror, that the machine is completely down. I can't ssh in or anything. I call the other guys in the department --- oops, making a phone call disconnects my Internet access. We figured out it was a memory overcommit problem. I called our ISP and had them reboot the box. It comes up to a shell, and I can log in remotely. However, every page is erroring out because of the database.

And then...

The database just won't start. I try it again and again. The error log says:

FATAL: pre-existing shared memory block (key 5432001, ID 7667712) is still in use
HINT: If you're sure there are no old server processes still running, remove the shared memory block with the command "ipcclean", "ipcrm", or just delete the file "postmaster.pid".

I look using ipcs, and postgres doesn't have any shared memory blocks (and it shouldn't -- we just restarted). I tried upping shmmax. No go. I tried lowering shared memory buffer usage. No go. What is wrong with the system?

It turns out, Apache was using one of Postgresql's shared memory blocks. ipcs just showed Apache. So, I turned off Apache, and then turned on Postgresql. Viola! It worked! Then apache started just fine. The only thing I can think of is that Apache chooses its block randomly, and it just happened to hit Postgres's this time.

So, by the end of all of this, it's 12:45 (too late to go to class -- I've missed the entire reason for being at seminary during the day), and then 5 minutes later the wireless comes back on.

Whew! What an hour!

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!