Older blog entries for Grit (starting at number 3)

I'm taking a very cool class this quarter, taught by Barbara Simons and Ed Felten. It's the Computer Science Policy Research Seminar. Lots of interesting stuff--- copyright, copy control, privacy, Internet governance, etc.

I've decided on a project about naming, since it's also a big chunk of my thesis topic. I've copied the abstract below:

Project: Using the Domain Name System for Content Segregation

World Wide Web content is referred to by Uniform Resource Locators (URLs). One portion of the URL is the server (or "host") identifier, which is looked up in the Domain Name System (DNS). DNS has a hierarchal tree structure, consisting of:

  • a single "root zone"
  • subsidiary "generic top level domains" (gTLDs, such as .com) and "country code top level domains" (ccTLDs, such as .us)
  • a vast number of "second-level domains" (such as stanford.edu) with their own policies and sub-delegations (e.g., dsg.stanford.edu)

Originally DNS had a limited set of generic top level domains, each with a specified use: ".mil" for U.S. military sites, ".com" for corporations, ".org" for non-profit organizations, and so on. However, with the increasing popularity of the Web, the meaning of these gTLDs has become less distinct. Personal sites are registered in ".com", and businesses register their trademarks in all available top-level domains. Countries with appealing ccTLDs, such as ".tv" and ".to", offer domain name registration to the world at large. ".com" by itself dwarfs the rest of the DNS tree, containing nearly all of the second-level delegations.

Recent attempts have been made to reverse this flattening trend by restricting the use of portions of the DNS tree, for a variety of reasons.

  • ICANN, the governing body responsible for allocating ccTLDs, has approved the creation of several content-specific names, such as ".aero" (air-transport industry), ".name" (individual use), and ".museum". Typically membership in some group is necessary for registering in such domains, but there are only loose constraints on what content may appear within sites bearing these names.
  • H.R. 3833 (introduced March 4, 2002) directs the administrator of the ".us" domain to create a ".kids.us" delegation. Registering within this domain would be contingent on agreeing to a set of guidelines for what content is appropriate.
  • S. 2137 (introduced April 6, 2002) directs ICANN to create a ".prn" domain, and mandates that any commercial web site which is in the business of "making available ... material that is harmful to minors" shall operate their service only under the new domain.
  • Various suggestions have been made to reserve top-level domains for non-ASCII domain names, such as domain names encoded in a Chinese character set.

These attempts bring up a number of interesting technical and policy questions, which my project will try to address through position papers on ICANN and on the bills mentioned above.

  • Is the creation of content-specific domain names actually useful? Would the goals of those advocating such domain names be better served by a different allocation policy among existing domains? Is there any technical reasons to favor the broadening of the DNS tree?
  • Is content segregation by domain name effective? Can children be shielded from inappropriate content using such mechanisms without infringing on the rights of adults?
  • Are there first amendment issues involved in naming? (For example, would the government be able to restrict the titles which books are given?)
  • Does the U.S. government retain the right to administer the entire domain name system? Or must any proposal such as ".prn" be subject to the same process as other new gTLDs?
  • What are the implications of content segregation policies on other protocols which use DNS, such as e-mail, and on other naming technologies, such as LDAP or Freenet?

No programming rant today. The DSL modem mysteriously started working this afternoon, so I've been migrating the home network. DirecTV (how I hate the missing 't') doesn't care if you run servers, so we're back to running our own mail server. But this time it's Postfix, not Sendmail.

Postfix is such a lovely program. Its configuration file was meant to be read by humans. Rather than the huge monolithic ogre of a program, it's composed of a bunch of specialized gnomes who work together... and best of all, it's written by a nice Dutch guy. I may convert the Stanford systems I administer over to it. (This is a big, big, win over the previous system, were mail destined for gritter.dsg.stanford.edu travelled through at least three machines, two protocols, and four software packages.)

Best of all, the switch away from ATT's cable modem (and their impending sale of our cable to Comcast) means that we're free at last from the Death Star!

I got a very nice response from Nathan Myers to my previous diary entry, explaining more or less what was going on. He is the author of the libstdc++-v3 string class, which is much better written.

While I'm in the C++ vein, exception specifications suck. My manager at work decided we were going to "make the best of them" by using them to ensure that all exceptions were derived from the "std::exception" class and no weirdo strings or ADTs were being thrown. Thus, every function would declare either throw() or throw(exception).

I wasn't able to stay for the original meeting where this was discussed, but I've tried to present a three-fold argument why this is a bad idea.

  1. Our coding conventions pretty much determine which functions can throw exceptions and which can't, so the documentation value of the specification is nil.
  2. The cost of the run-time check done caused by an exception specification is too expensive for the benefit gained.
  3. There are other ways of guaranteeing the property he cares about, such as using a compiler with semantic checking modules.

The problem with C++ exception specifications was that a decision was made to not force them to be checked at compile time. (Unlike Java, where exception types are checked by the compiler and thus don't need to be verified at run time.) This makes a certain amount of sense in that much code won't use exception specifications and thus has to be assumed to throw anything. But if the compiler doesn't (or can't) aggressively optimize out run-time checks, then you suffer a pretty significant performance loss.

I did a microbenchmark to try to quantify my point #2. Briefly, I found that a function call which took 7 ns before (including loop iteration overhead) took 20 ns with a throw() specification and 23 ns with a throw(exception) specification. For a programming style that uses lots of function calls, this is pretty troubling. Using a profile of an existing program, I estimated that this would increase (user-level) run-time by about 5%--- even in a program which spent half of its time calling read(), write(), memcpy(), malloc(), and one application-specific bit-fiddling routine.

Of course this isn't a substitute for actually trying it out on a real program, but who has time for that?

18 Dec 2001 (updated 18 Dec 2001 at 06:44 UTC) »

I thought I'd use this first diary entry for a rant, since I lost about a week of productive research work due to a bug in the C++ standard library that ships with Red Hat. (Why, might you ask, do I have time to write a diary entry after wasting all that time? I don't.)

The basic_string implementation uses a reference-counting internal representation class, called rep. Rather than use a lock, the implementor decided to use atomic operations to implement the reference counting--- a strategy which I heartily approve.

Unfortunately, the code to increase the reference count looks like this:

charT* grab () { if (selfish) return clone (); ++ref;
return data (); }
No problem, right? ++ compiles down to a single instruction, so the code works fine under multithreading.

But not on a multiprocessor system. When you pull out your microscope and think of the CPU as a load/store machine, an increment is a load, an add, and a store--- and the other CPU can jump in at just the wrong place. The correct solution, pointed out here, is to use the LOCK prefix on the add instruction, like this:

    charT* grab () { if (selfish) return clone ();
       asm ("lock; addl %0, (%1)"
            : : "a" (1), "d" (&ref)
            : "memory");
      return data (); }

When was this patch posted to gcc-patches and gcc-bug? July 2000. As of RedHat 7.1 (libstdc++-2.96-85), this bug still exists. (GCC 3 has a rewritten string class which does the right thing, thankfully.)

The patch did me absolutely no good. All I had to start with were wierd memory corruption errors that seemed to usually hit basic_string's nilRep member. I only knew what to search for in bug reports once I had (laboriously) traced the problem down to a race condition in the reference counting--- and at that point the answer was staring me in the face.

Frankly, I feel as if RedHat and the GCC maintainers let me down; they had a fix available for a year, but somehow it never made anybody's to-do list--- and as a result, all my projects have been pushed back a week.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!