Congratulations to my wife, Marissa, whose story is appearing in the June issue of Analog--- available at bookstores any day now. (We got her contributor's copies a couple days ago.)
Congratulations to my wife, Marissa, whose story is appearing in the June issue of Analog--- available at bookstores any day now. (We got her contributor's copies a couple days ago.)
I'm taking a very cool class this quarter, taught by Barbara Simons and Ed Felten. It's the Computer Science Policy Research Seminar. Lots of interesting stuff--- copyright, copy control, privacy, Internet governance, etc.
I've decided on a project about naming, since it's also a big chunk of my thesis topic. I've copied the abstract below:
Project: Using the Domain Name System for Content Segregation
World Wide Web content is referred to by Uniform Resource Locators (URLs). One portion of the URL is the server (or "host") identifier, which is looked up in the Domain Name System (DNS). DNS has a hierarchal tree structure, consisting of:
Originally DNS had a limited set of generic top level domains, each with a specified use: ".mil" for U.S. military sites, ".com" for corporations, ".org" for non-profit organizations, and so on. However, with the increasing popularity of the Web, the meaning of these gTLDs has become less distinct. Personal sites are registered in ".com", and businesses register their trademarks in all available top-level domains. Countries with appealing ccTLDs, such as ".tv" and ".to", offer domain name registration to the world at large. ".com" by itself dwarfs the rest of the DNS tree, containing nearly all of the second-level delegations.
Recent attempts have been made to reverse this flattening trend by restricting the use of portions of the DNS tree, for a variety of reasons.
These attempts bring up a number of interesting technical and policy questions, which my project will try to address through position papers on ICANN and on the bills mentioned above.
No programming rant today. The DSL modem mysteriously started working this afternoon, so I've been migrating the home network. DirecTV (how I hate the missing 't') doesn't care if you run servers, so we're back to running our own mail server. But this time it's Postfix, not Sendmail.
Postfix is such a lovely program. Its configuration file was meant to be read by humans. Rather than the huge monolithic ogre of a program, it's composed of a bunch of specialized gnomes who work together... and best of all, it's written by a nice Dutch guy. I may convert the Stanford systems I administer over to it. (This is a big, big, win over the previous system, were mail destined for gritter.dsg.stanford.edu travelled through at least three machines, two protocols, and four software packages.)
Best of all, the switch away from ATT's cable modem (and their impending sale of our cable to Comcast) means that we're free at last from the Death Star!
I got a very nice response from Nathan Myers to my previous diary entry, explaining more or less what was going on. He is the author of the libstdc++-v3 string class, which is much better written.
While I'm in the C++ vein, exception specifications suck. My manager at work decided we were going to "make the best of them" by using them to ensure that all exceptions were derived from the "std::exception" class and no weirdo strings or ADTs were being thrown. Thus, every function would declare either throw() or throw(exception).
I wasn't able to stay for the original meeting where this was discussed, but I've tried to present a three-fold argument why this is a bad idea.
The problem with C++ exception specifications was that a decision was made to not force them to be checked at compile time. (Unlike Java, where exception types are checked by the compiler and thus don't need to be verified at run time.) This makes a certain amount of sense in that much code won't use exception specifications and thus has to be assumed to throw anything. But if the compiler doesn't (or can't) aggressively optimize out run-time checks, then you suffer a pretty significant performance loss.
I did a microbenchmark to try to quantify my point #2. Briefly, I found that a function call which took 7 ns before (including loop iteration overhead) took 20 ns with a throw() specification and 23 ns with a throw(exception) specification. For a programming style that uses lots of function calls, this is pretty troubling. Using a profile of an existing program, I estimated that this would increase (user-level) run-time by about 5%--- even in a program which spent half of its time calling read(), write(), memcpy(), malloc(), and one application-specific bit-fiddling routine.
Of course this isn't a substitute for actually trying it out on a real program, but who has time for that?
I thought I'd use this first diary entry for a rant, since I lost about a week of productive research work due to a bug in the C++ standard library that ships with Red Hat. (Why, might you ask, do I have time to write a diary entry after wasting all that time? I don't.)
The basic_string implementation uses a reference-counting internal representation class, called rep. Rather than use a lock, the implementor decided to use atomic operations to implement the reference counting--- a strategy which I heartily approve.
Unfortunately, the code to increase the reference count looks like this:
charT* grab () { if (selfish) return clone (); ++ref;
return data (); }
No problem, right? ++ compiles
down to a single instruction, so the code works fine under
multithreading.
But not on a multiprocessor system. When you pull out your microscope and think of the CPU as a load/store machine, an increment is a load, an add, and a store--- and the other CPU can jump in at just the wrong place. The correct solution, pointed out here, is to use the LOCK prefix on the add instruction, like this:
charT* grab () { if (selfish) return clone ();
asm ("lock; addl %0, (%1)"
: : "a" (1), "d" (&ref)
: "memory");
return data (); }
When was this patch posted to gcc-patches and gcc-bug? July 2000. As of RedHat 7.1 (libstdc++-2.96-85), this bug still exists. (GCC 3 has a rewritten string class which does the right thing, thankfully.)
The patch did me absolutely no good. All I had to start with were wierd memory corruption errors that seemed to usually hit basic_string's nilRep member. I only knew what to search for in bug reports once I had (laboriously) traced the problem down to a race condition in the reference counting--- and at that point the answer was staring me in the face.
Frankly, I feel as if RedHat and the GCC maintainers let me down; they had a fix available for a year, but somehow it never made anybody's to-do list--- and as a result, all my projects have been pushed back a week.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!