Older blog entries for jum (starting at number 31)

The MacOS X package format is driving me crazy. Appearently if you do install the packages I did for the second time the files do end up in the wrong bin directory. I did make packages non-relocatable with the destination /usr/local/helios, with subdirectories like bin, sbin, etc and so on. The strange thing is that on the second install the contents of /usr/local/helios/bin winds up in /bin, but the same thing does not happen with sbin nor etc. I am at a loss to explain that one.

I have got a note from nriley on how to do UFS disk images, thanks! BTW, I did fill out my email address in the advogato account form, but this field is not listed anywhere on the personal page.

Today I did chase down a really weird bug. As I am working on server system software with lots of services I do have lots of processes listening for incoming sessions, like one for AFP file requests, SMB file requests, network print jobs, mail and so on. One of the servers is a mail server, it does listen to POP, APOP and a custom protocol for our own mail client protocol via either ADSP or TCP. The custom protocol also has provisions for sending mail via the same authenticated session used to retrieve mail, and there the bug did happen. Just upon sending an email message all listening servers would die, with the exception of the mail connection itself. So what does sending an email message have to do with terminating file service sessions and all that?

The solution is process groups. Previously our software used individually from shell scripts started daemon programs, each one daemonifying and backgrounding itself. The daemonifying includes calling setsid(), which also arranges for each of the listening servers to be in its own process group. But this has changed recently, in particular to solve the problem of inter-server dependencies with optional add-on servers, which was easier to solve using a custom starter program that topologically sorts the dependencies. This program also does daemonify and expects that it's child do not daemonify so it is able to monitor them with via SIGCHLD and to be able to log failures.

This new scheme (which is similar in design to the AIX system resource controller or the Windows NT service controller) thus caused all our servers to be in one process group. The mail component used a very strange interprocess communication method for new mail: it does listen for the comsat (biff) service socket in the master listening process and it does kill(0, SIGUSR1) to notify its children if new mail is available. The children in turn stat their mailboxes to find which one got a new message. This way each user sees a newly arrived message immidiatly without the common polling for changess. Unfortunately the default signal disposition for the SIGCHLD signal is to terminate a process, thus all the servers in the same process group not prepared to handle the signal did exit. The solution was simply for the service starter to call setsid() just after fork before execing the programs so each of the listening servers is again it its own process group.

After some fiddleing with the MacOS X package format I got working packages and an umbrella .mpkg to install all of the in one fellow swoop. I also got familar with hdutil to put it all into one disk image file. Currently I only got HFS format images working, I was not able to get an UFS image working like for example the ones from Apple. This is probably not important, but I really would like to know what I am doing wrong as the Finder pops up if I insert one of those images I did a newfs on and offers me to reformat that one as the format cannot be recognized.

While experimenting with the packages I did run the GUI PackageMaker quite often to compare what I did in my shell script with what PackageMaker generates. Once I did forget to set the default install dir, which means that it was by default /. The package I was doing did also have bin, sbin, etc and var directories like one has in /, so after installing that one for testing I blew away my MacOS X installation. Oops.

I had the plan to look into packaging today, with rpm first on the list and then the MacOS X packaging tool. Unfortunately someone did check in changes at a central library and left everything unbuildable. I hate it if one checks in changes without trying it first.

Found out why the Tru64 build was broken for quite a while. We recently switched compiler -D switches to _XOPEN_SOURCE=500. From looking at sys/socket.t this looked like it implies -D_SOCKADDR_LEN, so our main machine.h file defined HAS_SALEN for this architecture. The HAS_SALEN define means that this machine uses variable length struct sockaddr's, and thus one has to do extra voodoo to handle the struct ifreq array returned by SIOCGIFCONF. As I just found out the sys/ioctl.h file does not bother looking at _XOPEN_SOURCE and thus defined SIOCGIFCONF to the old version with fixed length structures. The sa_len member of each structure remains set in this case, and this made my code stumble across unaligned accesses. Turning on -D_SOCKADDR_LEN on the command line fixes that and all runs fine again.

Indeed smswebde works again with gocr. It is pretty funny that one has to read numbers from a gif file to make that work.

On the development side I implemented a plug-in class for our license mechanism to allow for multiple ways to determine the unique machine ID used to license the software. Traditionally this has been the unique hostid or some ethernet address, but on some architectures we will extract this number from a custom made USB dongle. Due to the machine ID mechanism put out as shared library instead of being linked statically into all programs it is surely easier to crack, but it also much more easy to support new dongle types. People fully determined to crack our license mechansim probably found other means already, so the additional insecurity is probably offset by the flexibility.

As usual the COM clone I did two years ago makes these kind of plug-ins very easy to implement. The Microsoft COM mechanism is pretty nicely designed, and the clone I did away with a few things I did not like. For one thing I did not use UUID's but Java style reverse domain names for interface ID's and factory ID's, this makes the source a bit easier to read. I also did not use the concept of globally registering objects in the Registry as Microsoft did it but in a per directory cache file. Applications then specify which directories the COM subsystem should look at and all objects in that directories are available. The only thing that can get a bit difficult with COM is properly keeping all the reference counts under control.

I have contacted the smswebde.pl author and indeed ocr will be the only way to get it working. Appearently he already had some successes with gocr and will post an update soon. I will not believe it until I have seen it working. :-)

Well, I have been using a script named smswebde.pl to send new mail notifications to my cell phone via sms for quite some time. This uses my web.de freemail account (which I did create just for that purpose). Recently this script stopped working and today I took the time to check why. To my surprise the freemail provider now requires to enter an additional numeric code into the send form to authenticate the sms send. The code itself is on the web page, but as gif image! Well done web.de folks, that one really surprised me. But not that I have given up, I think I will have to look at the encoded hidden fields if I can find anything that helps me to find that authentication number. I do not think that OCR on a gif file is practical in this situation.

For quite some time we have used a system named Preferences to consolidate all the configuartion information we have into one place. It was initially used only by the OPI subsystem and over the years other programs migrated to Preferences as well. Currently we are progressing for the next version to have nearly no ASCII readable configuration any more, everything is stored in the hierarchical Preferences data base (which happens to be some HFS deriative for those who know what HFS is).

So this decision will probably make the Unix old timers cringe as one of the hallmarks of Unix is the abundance of text config files. It also smells in particular like Windows Registry, which will not necessarily attract Unix wizards. But it really makes quite a few things easier. For example setting a single parameter is just a single command:


prefvalue -k Programs/atalkd/if -t strlist "hme0,le0"

Similarily it is easy do delete one parameter by using a command like this:

prefvalue -k Programs/atalkd/debug -d

All this can be entered on the command line without needing an editor, hunting down for the proper line and editing the line while not messing up the syntax. And internally the API is really easy to extract these parameters or iterate over the tree. The tree form also makes some more advanced usage possible, like having the same tree structure rooted at various points in the tree needs just one parameter changed in the API.

One of major complaints against binary configuration files is the inability to fix the file if something goes wrong. We added two commands named prefdump and prefrestore to help with that, prefdump dumps all the Preferences into an ASCII file and prefrestore restores it. In particular on system start we always save one prefdump away. On the modification side we do always completely rewrite the file under a new name and rename it after it is complete to avoid partial updates.

Today I got haunted by a rather old bug that surfaced on changing the size of a basic data type. At first it was a bit strange as the program in question did run fine on one machine and not on the other, both of which are SPARCstations running Solaris 7 with the same patches installed.

One one machine the program did bus error shortly after connecting from a Mac client. As it turned out one machine was running with one crucial library compiled in debug mode without optimization. The bug was that in this library the structure alignment requirements did change due to changing off_t to 64 bits as the result of the large file compile environment.

The library did place these structures into a shared memory segment using it's own allocator, but did align only a 32 bit boundary. The C compiler without optimizer did pad all structures to at least 64 bit boundaries and thus the problem did not appear. With the optimizer turned on not all of the structures placed into shared memory are aligned to 64 bits and thus at some point an off_t struct member was not properly aligned for the SPARC processor.

As one can see again a programming error (failing to provide the proper alignment) can go unnoticed for a long time and a change in the environment can make it pop up in code believed to be rock solid.

22 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!