Older blog entries for raph (starting at number 308)


I just sent out notes towards a 1.0 rev of the IJS spec. As a protocol, it's been pleasantly stable, and I get the impression that there are quite a few people using it without a lot of fuss and bother. This is good; I don't consider protocol design a shortcut on the road towards riches and fame.

MPEG2 telecine

We got an Apex 1100W DVD player for Christmas, which is a pretty cool toy. One of the things I've been playing with is transcoding DVD's to SVCD format. When done right, the quality is quite good, and I'm a fan of the CD-R format. It's quick, cheap, and easy to duplicate, readers are nearly universal, and it's good for archival as well.

Unfortunately, some DVD's are difficult to transcode with existing free software. I've had the best success with transcode and mpeg2enc (from the mjpegtools suite). But sometimes, particular for cartoon sources, there's jerky motion, visible interleaving artifacts, and A/V sync problems. I did a little digging and think I know what the problem is.

Most films are encoded into MPEG2 using "3:2 pulldown" telecine. Each 1/24th second film frame is converted into either 3 or 2 1/60th second video fields. By alternating between the two, the frame rates come out about right. (In fact, in the NTSC standard, the field period is exactly 1001/60000 second, which, if ignored, can lead to fun drifting of sync).

And indeed, most of the time, decoding of the MPEG2 stream will yield the original frames at (roughly) 24fps. Just scale to SVCD res, re-encode with the 3:2 pulldown flag set (-p option to mpeg2enc), and you're golden.

Only problem is, 3:2 pulldown is not exactly one flag in MPEG2 streams. In fact, it's implemented as two flags per "picture": "repeat first field" and "top field first" (described in more detail here). And, as one might expect, some DVD's use a fancier mix of these flags than vanilla 3:2 pulldown. For example, "Elmo in Grouchland" starts with 60 black frames with no pulldown, followed by the movie proper with 3:2 pulldown.

And the problem is that, as far as I can tell, there's no way to preserve the RFF and TFF field pattern from input (DVD) to output (SVCD) MPEG2 streams using free tools. In particular, mpeg2enc hardwires these flags on output to either all-0's (no pulldown) or a 3:2 pulldown pattern with a repetition period of 4 frames.

Modifying mpeg2enc to support arbitrary RFF and TFF, and transcode to provide these fields from MPEG2 input, would seem to be a very good idea. I'm sure whoever implements this will get love and appreciation from legions of users.

It's also possible that I'm missing a good way to do this with existing tools, in which case I'm sure someone will point it out.


Alan's school resumed today, and I start up with the homeschooling. I'm eager to try electrolysis of Hydrogen as a fun chemistry experiment. Since my goal is not (at present, anyway) to make a real table of elements, but to teach the basic concept, I think I'll go for the variation where the mixture of O2 and H2 gases fizz into soap bubbles, which are then be ignited for (hopefully) impressive yet safe effect. I'll let you know how it goes.

Max continues to astonish us with his language development. Over the past couple of weeks, he's pretty much mastered the alphabet song, and is just starting to be able to learn his individual letters (he knows "M", "A", and "X" pretty well now :). But he still managed to impress me with his facility at Alchemy, in which you have to match symbols and color. Some of our friends have been impressed merely that he's able to use the mouse. They should watch him clicking away at the thing like it's perfectly natural for a two-and-a-half year old. He's going to be a computer geek like his dad, I just know it.


I just got my first PPM file rendered from the new Fitz prototype. One of the nice things about this go-round is that I'm doing Python bindings for everything. I'm finding Python to be a very nice environment, especially for things like testing.

It's feeling very close to ready to put into the CVS repo. I'm also starting to feel ready to ramp up the public process. Especially with the clean Python bindings, I expect that there may be lots of involvement.

I'm definitely feeling daunted, though. There are lots and lots of things that need doing, over and above getting good rendering going. Documentation, language bindings, Ghostscript plumbing, porting, that sort of thing. Even within rendering, there are many different priorities. Should it be complete first, or very good at the common case first? My inclination is the former, but the desires of the free software community probably lie toward the latter.

In any case, I'll be reaching out to people who have expressed interest in Fitz over the next few days. I want the process to be very open, but there is one requirement: you must sign a copyright assignment form. This seems to me like a reasonable balance between commercial viability and getting cool code released as free software (Fitz is GPL plus dual commercial licensing), but hardcore purists might be dissuaded by it. Ah well.


The kids had a nice Christmas, with plenty of toys, and then a big family dinner at Grampa's.

I'm definitely struggling with time management. Part of the problem is that my sleep schedule is out of whack again. I seem to need at least 10 hours of sleep or so, which really cuts into my productive time. I have a lot of things I'd like to do, so I'm becoming determined to get a little more discipline, and use the time I have most effectively.

That means a little less time to play with proofs and other blue-sky things, but I'm fine with that. I do plan on keeping on going with this diary, though. Among other things, I find that the times that I'm most prolific here tend to coincide with times I'm most productive with work. So deliberately holding back my urge to write doesn't seem like a good idea.

I'm very happy that the days are getting longer again, at least in theory. You can't really see it yet, though. Right now, it's rainstorming pretty heavily. Even so, I know it will get brighter, and that's something.

Trust metrics for blogs

I had a nice long phone chat with John Hiler of Xanga and Microcontent News. He's just recently read my thesis draft, and is intrigued by the idea of using them in larger-scale blog services. The main application will probably be to suppress spam and abuse, and perhaps to ease support.

The prospect of having the trust metric ideas implemented for a mass audience site is exciting. I'm sure it will happen sooner or later, and this might just be the sooner.


Bruce Kushnick has written an "Unauthorized Bio of the Baby Bells", available for free download. I don't really have time to read through it right now, but I'm intrigued by his thesis that the telcos in this country are basically a scam. He comes across as a little bit of a crank, but my gut feeling is that he's substantially right.

While my experiments so far with VoIP have shown it to be not quite ready for prime time, I strongly suspect that it will become so in a couple of years or so, at which time the telco business will find itself quite disrupted. For example, imagine a cell phone that has an onboard 802.11 radio, and logs on to the Internet (instead of GSM or whatever) whenever it finds a base station within range. That would be cooool.

I'd be interested to hear what any Advogatan (or critical thinker from any corner) has to say about this book.

(via JOHO the blog via boingboing)

DNS, phone numbers, and ENUM

One of the many problems that needs to be solved before Internet phones become usable is directory services. What kind of name space do we want for IP phones? One obvious choice is phone numbers, which have the advantage of smoothing migration from POTS. Other choices include DNS-based email-like human readable addresses, or entropy-rich tokens such as hashes of public keys. As a stopgap, people might use IP addresses (these seem to be standard for NetMeeting and friends), but those have serious usability problems.

One proposal to map phone numbers to IP addresses is ENUM, which hacks the phone number namespace on top of DNS. The basic idea seems sound, but in practice it's almost certain to inherit the problems of DNS, not least of which is corrupt governance.

In particular, it sure looks like the telephone companies are going to be the primary owners of namespace chunks. This is the worst choice possible, because they have a powerful business interest to fuck up deployment of IP phones as long as possible, so as to preserve their price-gouging in the POTS business.

I want to be able to map my phone number to my Internet address. I'm skeptical that the ENUM implementation will actually be able to provide this service for any reasonable price and terms, but of course I'd love to be proved wrong. In the meantime, people interested in making IP phones happen should be thinking about other forms of namespace. Or, perhaps the right answer is an independent service that also uses the phone number namespace, but is actually trustworthy, as opposed to the telecom industry. Worth thinking about, in any case.

Periodic table

If you haven't seen the Theodore Gray's Periodic Table of elements, you should check it out. I especially liked his essay on why he did it. His philosophy obviously resonates with the hacker spirit of doing things yourself rather than leaving it to the experts.


I've been hacking a bit on Fitz, although the code is still not to the point where it does anything useful.


It's an interesting experience having very high Googlejuice. So far, there haven't been any big, mind-blowing consequences, but a bunch of smaller, pleasant ones.

One such experience just happened recently. In June, I wrote a diary entry about passwords, and mentioned somewhat offhand the Mandylion password token. Well, a few days ago I got an email from the Mandylion people, saying that they've been getting quite a bit of traffic from that (which in turn must be from Google). As a result, they offered me a free unit, and I accepted. It just came in the mail yesterday.

If I were a journalist, taking it would have been a breach of ethics. Fortunately, I'm not one. There was a big thread about this in the blogging world a couple of months ago. These certainly are interesting questions.


I get a good vibe from the OSAF people (Liam calls them the "open source air force :). Yes, Chandler is an address book, but for me the more interesting part is the planned infrastructure: a distributed version-control repository, with both server and low-admin peer-to-peer flavors. If this stuff works well, it could be useful for lots of other things.

In addition, I think their choice of wx is right on. There are many UI choices for Python, all of which have difficult tradeoffs. wx is the closest thing to a good, usable, cross-platform library, so it makes sense to fix it up to be really good.

Part of the motivation for my visit was to discuss whether Fitz should be the canvas for Chandler. It looks like the answer is probably no; Chandler is not all that graphically intensive, and there are real concerns with stuff like making sure the fonts are sychronized between the main UI toolkit and the canvas. So a better solution (at least for the short term) is to improve one of the canvases available for wx.


Dinner was quite lively. Eleven people in all showed up at Chef Chu's: David McCusker, Aaron Swartz, Bram Cohen, Andrew Lowenstern, Liam Quin, Liam's friends Jay and Tony, Andy Hertzfeld, Andy's wife Joyce McClure, and Philip Lindsay (from NZ). I don't think we solved many of the world's problems, but a good time was had by all.

Max's dream

Ever wonder what 2.5 year olds dream?

"I saw a ghost, a scary one." Was this a dream? "Yes. It was made out of wood, and it was flying. It was stupid!" How did you know? "I lighted a candle, and it went away. I didn't hit it." Good for you. "And it didn't come back!"

I'm posting at a somewhat odd time today, to make sure that people get a chance to contact me about dinner tomorrow. I want to follow up on some proof threads, but that will have to wait a day or so.

Felix Mellitus Viridisque

We have a translation! After extremely helpful info from Chris Ryland and forrest, we've settled on "Felix, Mellitus, Viridisque". Finding a really good translation for "sticky" turned out to be quite difficult, so we went for "mellitus" instead, which means literally "honeyed".

The family motto happened spontaneously, in a car trip a couple of years ago. Our previous family motto was, "it's more complicated than that," as it was a phrase we used quite frequently trying to explain to others our living arrangements - at the time, we were living with another couple. It was a noble goal, but ultimately there was just too much tension to make it work.

In a sense, the new family motto represents the opposite of our old situation. Not only was it fairly unhappy, but the woman of the other couple is compulsive about neatness. Neither Heather and I are, so as you can imagine, this was one of the major tensions. For the kids to have sticky hands is just fine if they're happy.

Google Answers

The response to my previous entry on Google answers was very satisfying. Apart from the Latin help, Telsa sent a pointer to an article describing one librarian's experience being a Google Researcher, and her subsequent termination for breaching the silence clause in the Researcher contract. I find that kind of censorship to be uncool in the extreme, if not exactly evil.

Exchanging knowledge can be a purely financial transaction, as in Google Answers, but it's better when it's done through one's social network. I find that most people genuinely want to share their knowledge (or, perhaps, there are a lot of people who don't, but I don't interact much with them). Google's insistence on anonymity of their Researchers would seem to actively frustrate the building of any such community around their service.

One illustration with the problems of Answers as commerce is the frequency of homework problems. Google's policy frowns on these, but of course it's often hard to tell for sure, and I can easily imagine how they'd be tempting for a Researcher to just answer and collect the fee.

Generally, I imagine kids with rich parents, who have learned that it's easier to throw money around to get what you want than to learn how to do it yourself, or tap into a network of people who can teach you. (Why is it that I can easily imagine kids of the Bush dynasty fitting into this pattern?)

Chris Ryland asserts that "getting the best answers from the community, without using a Google Answers-like business, only works if you're already a respected (or at least highly-read, e.g., Winer) leader in your community." I respectfully disagree. Being highly-read lets you be lazy, in that merely posting a question is likely to get you answers at least as good as Google Answers, but I believe everyone has such a network around them, and would do well to cultivate it. The main thing is to go around asking, "do you know anyone who knows (for example) Latin translation?" If you're stuck, many research librarians would be happy to help you as well.

For this sample query, it's interesting to note the importance of the network, as opposed simply to the volume of people reading my diary. Chris checked with a friend of his who's a Latin scholar, and who tutors his kids. Forrest asked an anarchist poet, who obviously was willing to answer the query.

Ever since starting to study social networks for my research on trust metrics, I'm continually blown away by their power. They're one of the most valuable resources at your disposal. I think a lot of people understand this instinctively, but for whatever reason I didn't really grasp it until I studied it academically. Perhaps it's that I grew up fairly isolated.

So, in the end, this is why Google Answers is at best a service that might be useful to some people, rather than the transcendent coolness of Google's search function. It's a substitute for building one's social network of knowledge-seeking, inferior and more expensive.


I'm visting the OSAF tomorrow, largely to talk about whether Fitz will be a good match for their needs. Afterwards, David McCusker and I will be having dinner at a Chinese restaurant somewhere on the peninsula, and we'd like to invite a few more people. Please send email if you're interested.

Lawyer-free zone

Lawrence Lessig has an essay in favor of limiting copyright terms. I agree of course, as I'm sure most Advogatans do. What struck me is his use of the phrase "lawyer-free zone" to describe the public domain, with the clear implication that this is a good thing.

I think that he's identified a clear cultural distinction between lawyers and non-lawyers. Lawyers, having been steeped in the legal approach to resolving disputes, assume that it's the best approach, and can barely consider any others. The rest of us recognize the legal approach as being hideously expensive, unreliable, often quite unfair, and with significant potential to limit innovation and flexibility.

These problems with the "lawyer zone", as opposed to the "lawyer-free zone", are the reason why the current trend to increase the reach of copyrights, patents, and trademarks is a bad thing. The problem is, of course, that most lawmakers come from the lawyer culture.

I'm impressed with Prof. Lessig for both being a damn good lawyer (his command of the legal issues as shown in his oral argument in Eldred vs. Ashcroft is downright impressive) and for being able to see and express so clearly the advantages of sometimes not doing things the lawyer way.

Free software, of course, is a shining example of an almost entirely lawyer-free zone. I'm impressed with the mechanisms we have for resolving disputes. One key, I'm now convinced, is fork rights. At heart, the right to fork a project gives people an option in case the current maintainer isn't doing a good job, for whatever reason.

A particularly clear examples is the handling of the SourceForge codebase. As the OSDir interview with Tim Purdue makes clear, VA Software did a number of things in direct opposition to the interests of the free software community, not least of which is making promises to release the code without any intention of doing so. A more subtle, but no less important, issue is the deliberate rigging of the installation process to make it difficult.

In the lawyer zone, these types of things would generally get handled as some kind of contract litigation. There'd be a lot of "discovery" to try to find a smoking gun to prove that this bad behavior was intentional, and in the end neither side would really win anything (in stark contrast to the lawyers themselves).

Fortunately, in the free software world, we have a better way. We simply fork the code, fix the problems, and move on.

I have high hopes that the Creative Commons can bring the benefits of the lawyer-free zone to writing, music, art, and other creative domains that have been increasingly dominated by the culture and processes of lawyers. We'll see!

Google's new stuff

Google absolutely rocks, but I'm not anywhere nearly as impressed with their more recent offerings.

I took froogle for a spin, but wasn't all that impressed. For example, the search "pc2100 ddr 512mb" gives you prices from $109.99 to $289.95 on the first page, while pricewatch is $96-$99 for the first page of the category. Other queries gave similarly unimpressive results.

I've written about Google Answers fairly recently. Here's how I think they (or someone else) could make a much more compelling service.

Very simply, start with the existing service but provide means of crediting user accounts for answering questions, in addition to debiting them for asking. Each user can choose to be in the red (as now), in the black (in which case Google would PayPal the money), or at zero, which is probably how most people would want to use it.

You have some nontrivial trust issues, especially as you're paying money out, but I'm pretty confident they're tractable. Google already has ratings, a dispute process, and so on.

Paying money to get an answer is interesting. It's a way for people to demonstrate that they really care about the answer, but at the same time it feels a little odd. On the other hand, if I can get an answer in return for answering 1.25 other people's questions, that would be appealing.

In any case, I find I have relatively little use for such a service, because I'm already part of a fabulous network of knowledge seekers. This diary is a significant component of that network. For example, one test query I was considering is the Latin translation of our family motto, "happy, sticky, and green" (in the sense of a child who has just eaten a pistachio ice cream). Now that I've posted the query here, the chances are excellent somebody will send an answer. Thanks in advance!

Why machine verifiable proofs?

moshez asked why I'm so interested in making machine verifiable proofs. Freek Wiedijk explains it better than I can, from a mathematician's point of view.

But my main interest is not so much existing proofs of mathematics. What I really want to do is develop correct programs. The theorems needed for programming tend to be much too tedious for mathematicians to deal with. But if the programmer has the patience to generate such a proof, at least a mechanical checker can give pretty good confidence that the theorem is true, and thus that the program is correct.

I'm doing okay, although still feeling buried in work, and not fantastically productive. Thanks to everyone who wrote with concern, and sorry for not responding to your email.

I spoke with David McCusker quite extensively yesterday. It's possible that Fitz will prove a good fit with the goals of the OSAF, which would be fun. I think I'll be visiting them soon.

In any case, it's clear that a working prototype of Fitz is on the critical path to everything else, so I'm going to try to get that done next.


For fun, I proved the ordered pair theorem for (nonnegative) natural numbers in Metamath. It's a very low-level framework, so a lot of aspects are tedious, but overall I was pleased with how it turned out. This is actually my first experience proving something completely rigorously. I recommend it to others.

I've been having a great email discussion with Norm Megill and Robert Solovay, and also with Josh Purinton. Very cool.

I haven't felt very productive recently.

My mom is in town for a week and a half, and we're doing lots of fun family things. This evening, we went to the tree lighting ceremony in Benicia.

In memoriam, Judith Stronach

A member of our Quaker meeting, Judith Stronach, committed suicide last week. It's very sad. She was intensely committed to peace. She taught regularly in the meeting's First Day (Sunday) School, and frequently got the meeting's children involved in projects to help children in faraway war-torn places such as Cambodia, Iraq, and Afghanistan. They won't have her to make these places, and the people living in them, real for them.

Others who knew her better talked of an inner darkness. It's likely that she simply felt too weighed down by the sad and frightening direction our country is taking.

Life is such a gift. I'm sure many people around me are feeling down, and some are probably suffering. I hope I'm able to reach out and show caring and tenderness when it's needed most.

Proofs, or: Set theory for dummies

I think I've worked out a simple, yet powerful, definitional framework for Metamath. I had the basic idea a few weeks ago, but ran into a technical problem with capture of unbound variables (not surprising that this is where the pitfalls lay!). I think I've solved it now.

I'm trying to do these definitions by textual expansion. In Metamath proper, the usual way to create a defintion is to introduce a new axiom. For example, here's the definition of ordered pairs, following Kuratowski:

df-op $a |- <. A , B >. = { { A } , { A , B } } $.

There's no syntactic difference between a new definition and a fundamental axiom such as ax-inf (the Axiom of Infinity). However, definitions are generally of a form which makes it clear they're not adding any axiomatic power.

Even so, I think it's important to formally distinguish axioms from definitions, especially for Web-based proof checking. If you download a random proof off the Internet, you really want to know that the definitions are "safe" in some formal sense.

I'm going to gloss over syntactic issues a bit. In Metamath, definitions are intertwingled with the grammar for parsing terms. In general, a new definition also extends the grammar. The main problem with this is that two grammar extensions, while themselves unambiguous, may lead to ambiguity when combined. Also, it's very important that the textual expansion matches the parse. In C, it's possible to violate this with amusing results:

#define square(x) (x * x)
int main() {
  printf ("The square of 1 + 1 is %d.\n", square(1 + 1));

So I'm basically going to assume that we're working with terms in a pre-parsed format (like Lisp S-expressions).

Ok, so here's how it works. A definition looks like this:

double ( A ) == ( A + A )
The first token is the thing being defined. There's only one definition for each token. If you try to define the same token twice, it's an error.

Next comes a list of variables. Following the Metamath convention, there's a previous declaration of the "type" of each variable. In set.mm, A is of type "class", which is the right one for numbers.

The == symbol means it's a definition. I chose it to avoid conflict with =, which is plain ol' equality.

Finally, there's the expansion of the definition, which is pretty much as expected. One caveat: it can only refer to definitions that have come previously; circular definitions are not allowed.

Ok, now the rule for expanding. For this particular definition, you can safely expand it anywhere it occurs, but that won't be true in general. So we restrict expansion to a single, especially safe place: between an assertion and its proof. If the final step of the proof exactly matches the expanded version of the assertion, we accept (the unexpanded version of) the assertion as proved.

For example, let's say we wanted to prove the assertion: double ( 2 ) = 4. Its expansion is ( 2 + 2 ) = 4. As it happens, this is already proved in set.mm, so the body of the proof can be just 2p2e4. Since the result matches the expansion, we accept the orignal assertion as proved.

Now for the tricky bit. The right-hand-side of a definition may contain dummy variables. An example is: V == { x | x = x }. (This is the "universal class" containing all sets). In this particular case, introducing the dummy variable in an expansion is safe, but that's only because the variable x is bound. Metamath doesn't make a formal distinction between free and bound variables, largely because it's pretty tricky to get right. Instead, it uses "distinct variables".

We add two rules for dummy variables: the new dummy variables must not appear in the original, unexpanded assertion; and, if a definition is expanded more than once within a single assertion, all the dummy variables must be the same.

Let's say you had the degenerate definition: bad == x. If you let the dummy variable x appear in the original assertion, you'd be able to prove junk such as both "A. x x = bad" and its negation. The flip side is that it's no longer trivial to prove things such as: V = { x | x = x}. The body of the proof has to establish { x | x = x } = { y | y = y}. Fortunately, cbvabv takes the main burden of such a proof.

The second rule is more subtle. If you let them sometimes be the same, sometimes different, you could arrive at a contradiction, so it clearly has to be one or the other. I think it's safe to make them all distinct (and it seems more intuitive), but making them all the same preserves truth better. Take "bad = bad" for example. This is trivially provable (using cleqid) without doing the expansion, but if all dummies needed to be distinct, the expansion would be "x = y", which is not provable. I prefer the provability of terms containing definitions to be exactly the same as their fully expanded counterparts.

Apologies for the somewhat detailed technical nature of this post. I do believe, though, that this kind of definitional framework is much better suited to proof interchange on the Web than Metamath proper.


We had a nice Thanksgiving at Heather's dad and with her sister.

The day before, Heather pointed out the spectacular colors of the sunset to Alan, and he remarked, "that looks just like a mandrill's butt," completely deadpan.


The fine folks at Azoogle have figured out how to "publish" email past my Spamassassin setup. In addition, to quote from their web page, "By securing contracts with each of these major email companies [America Online, Yahoo, Hotmail, MSN] it guarantees your emails will get to your customers without being blocked, rerouted or suppressed."

They claim that their service is opt-in only. The major mail providers probably take them at their word. A test account at Yahoo doesn't currently show any from Azoogle, but plenty from, for example, email-publisher.com (which is actually Topica). This is the stuff that gets past their laughable Spamguard(TM), mind you.

This all suggests quite strongly that digital identity alone probably won't make that big a dent into spam, contrary to Jon Udell's assertion. Some spammers might have trouble getting a digital id from whoever's issuing them, but my guess is that the likes of Azoogle will have no trouble at all.

I also got an interesting 419 mail from one "Ade Blessing". This one was full of Jesus-speak, and mentions early on that the author was a Moslem before he converted to Christianity. I hate spam, but have mixed feelings about the 419 stuff. The victims, after all, have to be pretty dishonest to fall for it, and it's a net transfer of money from a rich country to a poorer one, much of which actually subsidizes Internet access for ordinary citizens. In this particular case, the victims are most likely anti-Islam religious bigots as well.

Spam only exists because it works. Sigh.

299 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!