Advogato: Blog for jum

I have got a note from nriley on how to do UFS disk images, thanks! BTW, I did fill out my email address in the advogato account form, but this field is not listed anywhere on the personal page.

Today I did chase down a really weird bug. As I am working on server system software with lots of services I do have lots of processes listening for incoming sessions, like one for AFP file requests, SMB file requests, network print jobs, mail and so on. One of the servers is a mail server, it does listen to POP, APOP and a custom protocol for our own mail client protocol via either ADSP or TCP. The custom protocol also has provisions for sending mail via the same authenticated session used to retrieve mail, and there the bug did happen. Just upon sending an email message all listening servers would die, with the exception of the mail connection itself. So what does sending an email message have to do with terminating file service sessions and all that?

The solution is process groups. Previously our software used individually from shell scripts started daemon programs, each one daemonifying and backgrounding itself. The daemonifying includes calling setsid(), which also arranges for each of the listening servers to be in its own process group. But this has changed recently, in particular to solve the problem of inter-server dependencies with optional add-on servers, which was easier to solve using a custom starter program that topologically sorts the dependencies. This program also does daemonify and expects that it's child do not daemonify so it is able to monitor them with via SIGCHLD and to be able to log failures.

This new scheme (which is similar in design to the AIX system resource controller or the Windows NT service controller) thus caused all our servers to be in one process group. The mail component used a very strange interprocess communication method for new mail: it does listen for the comsat (biff) service socket in the master listening process and it does kill(0, SIGUSR1) to notify its children if new mail is available. The children in turn stat their mailboxes to find which one got a new message. This way each user sees a newly arrived message immidiatly without the common polling for changess. Unfortunately the default signal disposition for the SIGCHLD signal is to terminate a process, thus all the servers in the same process group not prepared to handle the signal did exit. The solution was simply for the service starter to call setsid() just after fork before execing the programs so each of the listening servers is again it its own process group.

14 Aug 2001 jum » (Master)