Older blog entries for raph (starting at number 208)

I'm a little behind on sleep, so tonight's entry will be short. Today was a nice family day.

A friend gave Alan a crystal radio set, so we put up a 100' antenna and tried it out. We were able to get a faint signal on one station, so it was a cool demo of radio waves, but could have been cooler.

I've been thinking more about trees over the weekend, particularly caching and change notification. Unfortunately, it gets complicated, and I worry that most readers don't have much context. At some point, I'll put up real infrastructure for Fitz, and the display tree will be part of the design docs. In the meantime, I like the blog form. I'll post more tomorrow.

Dave Winer has a big thread on blogs, journalism, and integrity. I'm not that moved by arguments of integrity. My feeling is that journalism is governed by Sturgeon's Law just like everything else. I fear that tech journalism is particularly affected, though. Most tech stories in the mainstream press have serious factual errors, and show lack of understanding on the part of the writer. I don't really care why tech journalism is so bad. It's likely to have something to do with the highly centralized structure of the media business, but I haven't thought much about the exact pathways. Dave asks: "Dumb-it-down or deliberate manipulation?" I'm not sure it matters much.

Blogs are also subject to Sturgeon's Law, of course. The vast majority are not worth reading. But there's real diversity out here in blog-land, no doubt related to the fact that blogs are not owned by a tiny number of megacorps. Can you imagine what a mainstream story on tree access protocols would look like? Yet, if you're one of the few people who cares about this, you're reading my blog, and I'm probably reading yours, and we're both engaging the subject very deeply.

Dave points out interviews as particularly bad in the mainstream. He's right. The process is fundamentally broken. The ideal of objectivity, while it might be important in other contexts, is somewhat pointless in an interview. It's the interviewee's point of view you care about. Why filter and distort it through a journalist who doesn't understand the topic and is a bad writer to boot? A blog lets you say what you meant, and if people misinterpret you, you can answer them.

Btw, I'm notoriously bad at checking my telephone answering machine. It's one way of doing flow control, I suppose. But I'll check it tomorrow. Now it's time to catch up on sleep.


Yesterday was Alan's last day at school. He was very emotional about saying goodbye to his friends and teacher. But that evening, we went to sushi to celebrate his graduation from kindergarten, and ran into one of his classmates. He worries a lot about making friends, but he's actually very social, much more so than either Heather or me at his age.

Max is going through another great leap forward in language development. As I blogged recently, he's working on irregular verbs. A few days ago, he said, "I broked it. I broked it. I broke it." You could read it on his face - "not quite right. Nope. Aah, nailed it!". This evening, he said "I dropped my bottle. Put it over my legs." And, touching the scotch tape I used to repair our copy of Goodnight Moon, "something sticky." It's only been a couple of months or so since most of his utterances were single words.

He's also very advanced physically. He can now kick a soccer ball well, in the direction he wants and with some force. Also, he blew my away by announcing "circle", then folding his collapsible sunshield into precisely that shape.

Alan had a similar language burst at almost the same age (25.5mo). Actually, we have to be very careful about marvelling at Max, because it makes Alan feel jealous. We reassure him about how smart he is, and how proud we are of him, but he still expresses a lot of doubts.


Wes briefly forgot his ThinkPad's BIOS password. This kind of thing happens all the time to real people. I commented on the need for far more sophisticated rituals for guarding keys, with both social and technical aspects. It's a hard problem, and it clearly can be done in both peer-to-peer and centralized flavors. Governments and evil corporations have a lot of motivation to pursue the latter. I'd like to see more thinking on decentralized approaches.

Of course, at the heart of the problem is the fact that it's all but impossible to securely store a key on a general purpose PC. Ry4an Brase pointed to a really neat toy. This particular model is a bit limited (in particular, if you lose or break it, you're hosed), but I think more specialized hardware like this will play an important role.

Cheap parts

A few people have expressed interest in getting a dual Athlon system similar to spectre. One question that came up: is generic RAM actually any less stable or reliable than the "name brand" variant? I really have no idea. If you were going to get a gig or more, the price difference could be significant. I chose not to take the risk, but I have a feeling that it's probably mostly a marketing strategy on behalf of the "name brands." For example, I know that Apple SDRAM, at $150 for a 256M PC133 SODIMM, is no different than Crucial's at $69.99 (including shipping). The question is whether it's any more reliable than the $38 part from Pricewatch. (side question: why the hell did Apple put a SODIMM socket in the iMac?)

Again, I think this would be a killer app for a trustworthy metadata system. What if almost all generic parts were good, but there were a few suppliers that weren't. Wouldn't it be cool to actually know that? Also, if people had a good place to report stuff like drive failures, I think information about lemon products would disseminate much faster.


The machine arrived today. It seems pretty sweet. Ghostscript compile is down to 53s. Fitz rendering of the tiger is 130ms, but some of that is debug overhead. I think I'm going to like this machine.

The only serious hitch so far seems to be on-board Ethernet, which gets stuck. Popping in a Tulip PCI board fixes that.

150 dpi

The Matrox G550 can handily drive the monitor at 2048x1536, or around 150 dpi. It looks surprisingly good, I think in large part to the quality of the Matrox card. I think I might stay with this resolution a while. Obviously, most default fonts are too tiny, but I can easily configure the ones I really care about.

testrgb speed is 30.5 Mpix/s in 24-bit. This is pretty good, but I was hoping for better. I'm not sure I have everything tuned yet. For one, this is XFree86 4.1.0, and 4.2.0 is now out. Incidentally, setting AGP to 4x makes no noticeable difference. I'm not surprised, but in theory it should. testrgb is very bandwidth-intensive, which is what AGP is all about.

802.11b audio

rkrishnan: you're right of course, that real Internet telephony is untrivial. But in talking about D/A's, my main point was that the same basic platform could also do CD-quality audio, which would make it much more interesting to "enthusiasts", as opposed to corporate customers.

I think it's inevitable that all these kinds of products will come out over the coming months.

7 Jun 2002 (updated 7 Jun 2002 at 08:53 UTC) »

A little Mac OSX hack called Silk made the rounds of the Mac blogs this week. I'm not sure exactly what it does, but it claims to turn on Quartz font rendering in Carbon apps.

However, it's not quite the same. In all the screenshots I saw, the Silk-rendered text did not have subpixel antialiasing. Native Mac OSX apps do. Check out Chimera screenshots 1 2 3, for example.

At low resolutions, I think the choice between hinted aa, and unhinted aa (with subpixel positioning) is a matter of taste. Many people complain that the latter is blurry, but there's also a case that it's more aesthetic. Reports from the field are mixed. Doc Searls is bothered by the blurring, but many OSX proponents love unhinted rendering. See this thread for more opinions, both pro and con.

I also think that by choosing unhinted rendering as the "new, cool" look, Apple has influenced the taste of Mac OSX users greatly. OS9 (or "Classic") apps look old, even dated, although their font rendering is actually sharper. I've been interested in font rendering for a long time, and I did not anticipate the effect that marketing can have.

In any case, as resolutions go up, the win for unhinted aa becomes clearer. The irony is that not only are Apple-brand displays stuck at '90s-era resolutions, but Aqua isn't scalable, so as resolution increases, fonts get tinier too. Why they decided this is utterly beyond me, especially as the underlying Quartz technology is quite scalable, as was the Display PostScript that preceded it.

Linux UI's aren't scalable either, in the sense that there's a knob you can turn to scale everything (good luck getting everybody to agree on the knob!), but they are overly configurable, so you can fiddle with the fonts and sorta get good results.

Few Linux UI's do unhinted rendering, either, but it's not especially difficult. In particular, there's no reason why a Gecko-based browser can't match Chimera almost pixel-for-pixel. All it would take is turning off the hinting, and implementing subpixel positioning.

Future Ghostscripts will do unhinted aa, with subpixel positioning, by default. Hopefully, I'll have time to polish my patch, and maybe get it into HEAD, next week.

See also On antialiasing type by Markko Karppinen.

A little crypto

zooko posed a fun problem on his blog (and in #p2p-hackers irc). It goes like this: Alice chooses a bit. She then sends a message M1 to Bob. However, at this point Bob should not be able to figure out the value of the bit. Bob encrypts a message using a key derived from M1, and sends this message (M2) to Alice. If the bit was 1, then Alice decrypts the message. However, if the bit was 0, then Alice should be unable to decrypt the message. Finally, in the last phase, Alice reveals the bit, along with a proof that this was the same value as chosen when M1 was sent.

Zooko has a colorful motivating example, written in terms of centaurs and ogres. You might enjoy thinking about the puzzle, especially if you know some crypto. The answer will be revealed soon.

(PS to zooko: if you had a permalink, I would have happily linked it)

The trust metric

Well, restricting recentlog to certified entries would have been ineffective, in addition to the various other downsides. So much for that idea, at least for now.

Out of curiosity, I ran PageRank on the Advogato trust graph, treating all links the same (one interesting twist would be to weight blue and purple links more than green ones). The node "bytesplit" comes in at rank 2444 out of 3209. If Advogato were to be based on PageRank, I'm not sure whether it would be better to set a threshold above this or below it. Another way of looking at the graph is that there are two independent certification paths from the seeds. That's not too shabby.

So, a reasonable conclusion is that the trust metric is not the best way to address this conflict. I never really thought it was.

Overzealous spider

Whoever's behind, your spider is broken. Requests like these are making up about 80% of all traffic right now, and it's impacting responsiveness. - - [07/Jun/2002:01:51:17 -0700] "GET /person/mbp/../jwz/../Schemer/../rachel/../Telsa/../alan/../NickElm/../alan/../rillian/ HTTP/1.0" 200 14020 "-" "Mozilla/3.0 (compatible)"


Ralph and I had been stuck for a long time on a bug that caused the symbold dict decoder to fail partway through all real streams (but of course succeed on the test stream in the jbig2 spec). We wrote the UBC people asking for help (trace data would have been enormously helpful), but were completely ignored. I'd like to give them the benefit of a doubt and believe that they were simply too busy (lord knows I suck at answering my own email) rather than having sold their soul to the corporate devil, but it's impossible to be sure.

In any case, William Rucklidge, primary editor of the spec, dropped by the mailing list recently, and was able to provide spec clarification and trace data, and we fixed the bug handily. Thanks!

Voice on 802.11b

Wes pointed a number of interesting 802.11b voice products on the horizon. Vocera seems to have a particularly appealing product - a 46g, 10x3.5cm "badge" that has a speaker and mic built in, and can take Plantronics headsets. Also, Broadcom and Agere are working on "VoIP phone on a chip" chips. There's a good chance that low-cost phones will be available soon. Current VoIP phones are dramatically overpriced.

All this stuff seems designed for large corporations. I don't see any signs that anyone is trying to sell to actual people. Dual 16-bit, 44100Hz D/A's would be a good start (the chip is about $6).


I spent a lot more time with Valgrind today, and caught quite a few Ghostscript bugs. The tool rocks. movement expresses amazement that people are still finding out about the tool. I think it's because you don't see how good Valgrind is until you use it yourself. There are a lot of crappy proof-of-concept free software projects out there (and SourceForge hosts most of them!). It's quite rare that a project delivers this level of quality so quickly, coming out of nowhere.


Today I wore the "Welcome Mozilla" t-shirt I got on the launch of the project over four years ago. I've also pretty much switched over. Congratulations!

Petty personal fighting

chip86: I can certainly see why you're frustrated. In general, I don't like to intervene in these cases. In my experience, ignoring the unpleasantness works better than anything else.

Even so, one simple change I'm considering is to only include certified entries in recentlog. I thought about that last time around, but the problem went away before I got around to implementing the change. That change would be a little bit unfriendly to new people joining, which also makes me somewhat reluctant.

bytesplit: I don't understand why you and Christian are fighting, and I'm not sure I want to. In any case, I don't recommend Advogato as a forum to air your grievances.

I'm hesitant to post even this much, because I am not getting involved. I hope this doesn't escalate, but if it does, it will be a good testbed for the trust metric ideas.

Btw, George Monbiot has some writings that will be particularly interesting for those who like controversy.

Valgrind kicks ass

I spent some time today using Valgrind to track down uninitialized memory reads and similar problems in Ghostscript. So far, 4 out of 5 have been legitimate problems, and at least one was probably capable of real mischief. The one false positive was due to a compiler optimization, and became clear on viewing the assembler (roughly, ashort[0] == 42 && ashort[1] == 69 being collapsed into a single comparison).

Peter Deutsch reports that his stabs using Purify and Insure++ were very frustrating, because about 90-95% of the reports were false positives.

The lack of a good free memory debugger has long been a source of disappointment to me. Valgrind has singlehandedly done a lot to renew my faith in free software.

New monitor

The new monitor for spectre appeared today. It's a ViewSonic P95f. Overall, it seems pretty good, but not sensational. I wanted a 19" to save space and power, figuring that the next monitor I buy will be a high-res LCD such as the IBM T221 (drool).

The monitor goes up to 1920x1440 (about 140 dpi), but running test patterns shows that the horizontal resolution is limited by the aperture grill, which is about 100 stripes per inch. As far as vertical resolution goes, you can actually see individual scanlines even at that resolution.

So the "real" resolution is somewhere around 1600x1200 (about 115 dpi). I wonder how shadow mask monitors such as the Hitachi CM715 stack up by comparison. That display has a dot pitch of about .22mm, which should match the 1600x1200 well. It's cheaper and draws less power, too.

I haven't finished playing with font rendering, but so far it looks like my original suspicion is true: at 140 dpi, antialiased unhinted text is nearly as contrasty as monochrome. Driving the monitor at more than its "real" resolution might be a good deal, if it means you can use unhinted aa fonts without compromise.

Internet telephones

Wes posted a link on ENUM recently. Basically, it's a mapping from telephone numbers to domain names, so you can use DNS to resolve them to IP addresses. You reverse the digits, join with '.', and put .e164.arpa at the end. So my phone number is This should resolve to something like, but of course doesn't. The phone companies prefer it this way. There's no good technological reason to have a phone any more, but as long as it's too hard for near-PhD's, the phone companies don't have much to worry about.

I tried setting up gnomemeeting with blanu (who is experimenting with streaming media), but we didn't get it to work. On my side, it was probably the laptop's sound driver, for which I've never really bothered to get the microphone to work.

Part of the problem is trying to use general purpose hardware like a PC with a sound card for a specific task. However, there's no reason why you couldn't build internet phone hardware. In fact, I think it's a great idea.

For my setup, I'd want two pieces of hardware. One would be exactly the same hardware as an Apple AirPort, but the phone port would be for voice calls, not for the modem. The AirPort is a 386 PC anyway, so I wouldn't be surprised if you could use it for this.

The other piece of hardware is simply A/D and D/A converters, an 802.11b card, and a battery. The cost of the 802.11b chipset is around $22, and cards are now retailing for $35 after rebate. There's no reason why this part couldn't be retailed for $100. Of course, what makes this product really appealing is playing Ogg files from your home media server. Before the iPod was released, some people thought that's what it would be.

I think whichever manufacturer figures this out is going to sell a hell of a lot of them. Making the (Windows) software nice and easy to use is nontrivial, but I don't care. I'd just run it off my Linux boxen anyway.

Google and the dangers of centralization

Google is amazing. When in online conversations, I routinely ask Google for the answers to questions that come up, and I almost always get the answers. Not only that, but Google is fast. In fact, it's quite competitive with DNS.

It's going to be very, very hard for anyone else to compete with Google, in part because of their PageRank algorithm (the rest of it is just kick-ass good implementation). As a result, Google is in great danger of become a central point of failure for the whole useful Internet. People don't seem to have started worrying about this yet, probably because they're so damn good at what they do. By contrast, VeriSign (formerly Network Solutions, formerly the InterNic) got people worried fairly early on, because they suck so hard.

But Google is not a public service. In fact, it will probably become a publicly traded corporation soon, with a fiduciary obligation to shareholders. How much money might they be able to extract from their position? Think about that when you read The Google AdWords Happening.


I note with some Schadenfreude that SourceForge no longer seems to be hosting their own downloads, using mirrors at Telia, ibiblio, and Belnet instead.

Why Schadenfreude? In large part because they don't listen when I have things to say. Other, healthier organizations and people don't seem to have this problem. Oh well.

Blog 101

This entry is a collaboration with aaronsw.

Blogs are pretty basic. At a minimum, you put posts in reverse chronological order, and provide a permalink (a URL for the post which will hopefully never change) for each one. The permalinks allow other bloggers to put links to your posts in theirs, usually with a comment or response. Links are lifeblood of blogs, and a large part of what makes them interesting.

You don't need any special software to run a blog, although it might be convenient. Some people (such as David McCusker) just edit the HTML by hand. But using a tool can be convenient, and many offer lots of extra features. The most important extras are keeping a list of other sites you visit (a "blogroll") and exporting RSS. Blogging tools should also let you change HTML templates (so you can tinker with the look of your site) and many provide a way for readers to comment on what you write. There are a number of third-party tools to do the latter, such as YACCS and QuickTopic. Most tools support the Blogger API so that you can use a GUI tool like blogBuddy to edit your site.

Blogs are immediate, but not quite so much as chat or instant messaging. Most of the time, blogs are read within a day of being written. One way to think about this is the coherence between what the writer writes and what the reader reads.

Blogs are also firmly in the control of their writers. This means that individual blogs are free of spam, trolls, and other forms of abuse common on the Internet. If someone does try to spam or troll, it's easy to just ignore them.

Blogs are one of the most fertile areas for experimentation on the Web today. People are trying out RSS aggregators, Google API, referrer log analyzer, and other such toys. It's also a fertile ground for research, with projects like blogdex and reptile analyzing the social networks formed by blogging communities.

Advogato diaries are basic blogs, but lack the fancier options. My guess is that it will add some over time. The focus is different than most blog servers - presentation is simple, and free software is a strong theme. The recentlog also occupies a central role, acting as a kind of "communal blog". The scale seems to work well. If more people wrote diaries regularly, the recentlog would be too much to read. On the other hand, there is enough content to convey the vitality of the community.

Social nets have both strong links (close friends and family) and weak links (casual acquaintances). Social networking theory tells us that both are important. Traditional forums such as mailing lists are pretty good at the strong links, bug blogs are public and anyone can read them, so they create weak links among people in different communities that allow information to disseminate very rapidly. Sites such as the Daypop Top 40 track this flow of links.

Even though individual blogs are so simple in structure, you see emergent behavior in the network of blogs. It's common to see conversations between blog authors. In some cases, these blog conversations can take the place of email. This only works if the intended target of the message is reading your blog. Since most people can't read every blog, tools for finding blogs that link to you are quite popular. Many web servers provide "referer logs" which track what site your readers followed to your blog. Some blogging tools even make this part of the blog itself, so readers can follow the links for more information or other points of view. Similarly, backlink tools search the Web (often via Google) to find other pages linking to yours.

Blogs are becoming increasingly popular and mainstream. It's very interesting to see more communities start blogs and how they change and push the medium. Bloggers remind me of Ted Nelson's Hypercorps, the young librarians and teachers of the hypertext system he foresaw and were "paid to sit around and make things interesting for you". Bloggers are only paid in the admiration of their peers, but that admiration is seductive. Soon after you're hooked on reading blogs, you're likely to want to start a blog yourself.


Watching Max's language evolve continues to be a delight. He's working on irregular forms now. He still says, "breaked", and when I say, "broke" back, he usually says, "broked". But in other cases, his use of irregular forms is perfect. Saturday, he picked up "three leaves" (he loves numbers and counting too). I asked him what it would be if it were just one, and he said "one leaf". Also, most of what he says now is either sentences or fairly complete fragments.

There's only another week of school remaining for Alan. We're hoping that his summer break will give him an opportunity to relax and let go of some of his anxiety. When he's in an anxiety freak-out, he is heartbreakingly eloquent at expressing it. At other times, he's happy, bright, and confident. But the amount of time spent in high anxiety seems to be going up. Does anyone have a recommendation for a child psychiatrist in the Bay Area, preferably one who is good with gifted children?


David has been blogging tension in his relationship with his wife, Lisa. However, he has now removed a lot of this discussion at her request. Announcing that he was going to before doing so seems just a bit hostile to me - it is almost like an invitation to archive (I didn't, btw).

In any case, Heather and I were very much reminded of a rough patch in our own marriage, when we briefly separated. Another parent at Quaker meeting asked how I was when we were putting our kids in the nursery, and I responded, "Heather is leaving me". His response: "who's Heather?" We still laugh about that, but it was painful at the time.

I was obviously reaching out to people I could talk to about the problems ("weak links" in the social net lingo), because most of my closer attachments were either entangled in the difficulty, or I really wanted to them to stay untangled (like my advisor and colleagues at school).

If David were to call, I'd be more than happy to talk to him about trees and other stuff. My phone number is pretty easy to find (ordinarily I would send email with it , but this is part of an experiment to avoid email).

3 Jun 2002 (updated 3 Jun 2002 at 07:48 UTC) »

Review of Linked: The New Science of Networks, by Albert-László Barabási, ISBN 0738206679.

Highly recommended. Over the past few years, there has been a flowering of academic research into the nature of networks such as the link structure of the World Wide Web and social networks. This book collects some of the highlights of this research, and presents them as well-written stories. Barabási makes a strong case that network thinking will have profound implications over the coming years, not just for the World Wide Web, but for biology, social networks, and other areas of science as well.

"Scale-free" networks have a central place in the book, not surprising as Barabási is co-author of a groundbreaking paper on the subject. The original paper dealt with the Web, but subsequent research has turned up a number of other networks with similar topology. Even within the Internet, scale-free topology applies both to the network of routers and backbones as well as the link structure of the Web.

Scale-free networks are instantly recognized by their characteristic power-law degree distribution. There are a few highly-connected nodes, and many nodes with just one or two links. By contrast, the degree distribution in random networks tends to be a tightly clustered bell curve.

A simple model generates randomized scale-free networks. Start with a small seed network, possibly a single node. For each new node, add a link from that node to existing nodes, with probability proportional to the indegree of the existing node. Thus, the model captures both growth and preferential attachment. If either element is missing, the resulting network is not scale-free.

These networks have a few important properties. First, their diameter is very small. This property has been known in social network theory since the brilliant "small world" experiments of Milgram in 1967. The idea was popularized in the 1990 play by John Guare, "Six Degrees of Separation", and has since entered the popular vocabulary.

Second, such a network stays well connected even when random nodes are removed. This is an "attack resistance" property of the network, not directly related to the attack resistance of trust metrics, my own specialty (although the underly concept of network flow plays an important role in the analysis of both).

However, when a few highly connected nodes are removed, the network fragments. Thus, scale-free networks are in this way more vulnerable than random networks.

Barabási does not address trust metrics. This is a bit surprising, because Google plays a part in the book, as a "more fit" search engine that rapidly becomes a hub even though existing search engines such as Inktomi and Altavista have already established names for themselves. Barabási misses the opportunity to explain why Google is better. Also, Gaure's play deals with a con artist who is expert at playing his social network to further his own goals, but Barabási does not pursue the theme of trust (and violation of that trust) in social networks.

Even if you are familiar with scale-free network theory, the book is still a fun read, and the presentation may be helpful in talking with others. For people involved in Internet software design, and in design of p2p networks, this book is essential reading. The book nicely balances an accessible presentation with meaty intellectual content. Most people who enjoy thinking about the world will find something of interest.

Thanks to Jim McCoy for recommending this book to me.

2 Jun 2002 (updated 3 Jun 2002 at 07:42 UTC) »

Yesterday's entry brought a very good e-mail response from Keith Packard. I'm happy about the way that's going now.

There is a big difference between doing yet another fontmap for a new application, and doing something that is technically sound as a universal fontmap for (hopefully) all apps. The latter is clearly a lot more work. I wasn't sure whether Keith was serious about doing it, but it seems like he is.

File format, API, or protocol?

The fontconfig discussion brings to mind a few thoughts I've had about config in general. I'm blogging them here because I think they may be relevant to a lot of people.

When you're designing a config mechanism, one of the big questions is whether it will take the form of a file format, API, or protocol. Each has its own set of tradeoffs. The big question is what you want to pin down, and what you want to stay flexible.

If config is in a file format, there can be multiple implementations. However, when that happens, it can be very difficult to change the file format itself, because the risk of breaking things goes up with each additional implementation. You could explicitly design the file format to be extensible (something that XML makes relatively easy), but even then implementations can have limitations and bugs that need to be worked around.

Specifying an API instead lets you change the underlying file format all you like, at least in theory. The downside of the API approach is the danger of runtime incompatibility. For example, you generally won't be able to write pure Python or Perl scripts that access the config info. Instead, you'll have to write a wrapper.

Using an API instead of a file format also lets the library multiplex more than one file format. This can help portability. For example, it's not hard to imagine an API like fontconfig's being ported to Windows or Macintosh systems. The app would call FcFontList, which on Linux would read and parse the fonts.conf file, but in Carbon might be something like FMCreateFontIterator. So far, I don't think fontconfig has tried to do any of this.

It's also possible to implement config data as a protocol, for example talking through Unix domain sockets to a daemon which manages config information. This is generally a much more heavyweight solution, because of the need to keep a daemon running. There are some advantages, though. If carefully implemented, you can get low-latency change notification. In the case of fonts, this give apps access to newly installed fonts without having to close and restart the app. It's not clear that this feature would justify the extra implementation cost. It's also worth noting that protocols give you the same kind of decoupling from the file format that API's give, but without the tight runtime dependencies.

Also note that app configuration wrt X fonts happens through the X protocol. In practice, though, almost nobody speaks the protocol directly. Instead, they all use an API call such as XListFonts.

I'm thinking about fonts now, but these distinctions are probably valid for many types of config info.

XOR metric followup

Earlier, I had said that use of a XOR metric or similar technique for finding short routes was a good indicator of the health of a p2p project. Zooko has put up a nice response arguing why Mnet doesn't yet do it. I agree, "the research isn't really done yet" is a good reason.

Trust, backlinks, Xanadu

Paul Snively sugggests that he, David McCusker, and I collaborate on a Scheme implementation of a trust metric for something like Xanadu-style backlinks. It's an interesting idea. If I actually had free time, I'd be inclined to pursue the idea. For one, the recent talk about Lisp has gotten me interested in trying out the language again. It's been years since I've written any serious Lisp. Would I find it a good tool, or would I want to go back to Python as soon as possible? See On the Relationship Between Python and Lisp by Paul Prescod for an argument for the latter.

In any case, if David McCusker and I collaborate on anything soon, it will almost certainly be some kind of IronDoc/Athshe mindmeld. David has already said that he's not into trust metrics.

In any case, I certainly agree that backlinks are a good application for a trust metric. Two-way links have the problem that they're susceptible to spamming. Forward links don't have that problem, which may be one of the reasons why the low-tech, "worse" Web prevailed over the high-tech, "better" designs behind Xanadu.

Implementing backlinks within Advogato, as DV proposed, would solve the spam problem by using the existing trust metric. But this doesn't work for all those interesting backlinks from blogs outside Advogato-space.

If PageRank were available, then I think it would be another good solution. Sort backlinks by the PageRank of the target page, and do some kind of reasonable pruning. If Google won't do it, there's always my independent implementation :)


I had lunch with, then spent the afternoon with Brian Stell of Mozilla. He is on a mission to make Mozilla printing work well on Linux, especially the i18n bits. Already, he's provided us with fixes and test cases for incremental Type 11 fonts.

There are a lot of tricky issues, and the GNU/Linux community has historically done a poor job dealing with printing. It's an area where cooperation between many diverse pieces is needed, but there's nothing that's really motivating a solution except people's frustration. Brian is trying to solve the problem the Right Way for Mozilla, with the hope that others might follow.

Among other things, Brian is trying to figure out which versions of PostScript and PDF to target. For compatibility with existing printers, you want to generate lowest common denominator PostScript. But there are also advantages to generating more recent versions. For example, you can drop alpha-transparent PNG's into a PDF 1.4 file and they'll render correctly. That's not possible with any version of PostScript, or with earlier versions of PDF. On the font side, many (most?) existing printers can't handle incremental Type 11 fonts, even though they're in PostScript LanguageLevel 3 and also many Adobe interpreters before that (2015 and later).

A good solution would be to generate the latest stuff, and have converters that downshift it so it works with older printers. Alas, no good solution exists now. Ghostscript can rasterize without a problem, but sending huge bitmap rasters to PostScript printers is slow and generally not a good idea. pdftops can preserve the higher level structure (fonts, Beziers, etc.), but is limited in many other ways, among other reasons because it doesn't contain an imaging engine. So, at least for the time being, it seems like the best compromise is to have a codebase that generates various levels of PostScript and PDF.

A chronic problem in for GNU/Linux is a mechanism for users to install fonts, and applications to find them. At least five major application platforms need fonts: Gnome, KDE, Mozilla, OpenOffice, and Java. You also have a number of important traditional (I don't really want to say "legacy") applications that use fonts: TeX, troff, and Ghostscript among them. Plenty of other applications need fonts, including all the vector graphics editors, Gimp, and so on. I suppose I should mention X font servers too.

Most applications that need fonts have a "fontmap" file of some kind. This file is essentially an associative array from font name to pathname in the local file system where the font can be found. Actually, you want a lot more information than just that, including encoding, glyph coverage, and enough metadata at least to group the fonts into families. In some cases, you'll want language tags, in particular for CJK. Unicode has a unified CJK area, so a Japanese, a Simplified Chinese and a Traditional Chinese font can all cover the same code point, but actually represent different glyphs. If you're browsing a Web page that has correct language tagging, ideally you want the right font to show up. Unfortunately, people don't generally do language tagging. In fact, this is one area where you get more useful information out of non-Unicode charsets than from the Unicode way (a big part of the reason why CJK people hate Unicode, I think). If the document (or font) is in Shift-JIS encoding, then it's a very good bet that it's Japanese and not Chinese.

This is why, for example, the gs-cjk team created a new fontmap format (CIDFnmap) for Ghostscript. In addition to the info in the classic Ghostscript Fontmap, the CIDFnmap contains a TTC font index (for .ttc font files which contain multiple fonts), and a mapping from the character set encoding to CID's, for example /Adobe-CNS1 or /Adobe-GB1 for Simplified and Traditional Chinese, respectively.

To make matters even more complicated, as of 7.20, we have yet another fontmap format, the xlatmap. The goals are similar to the CIDFnmap, but with different engineering tradeoffs. One of my tasks is to figure out what to do to unify these two branches.

In any case, there are really three places where you need to access fonts, and hence fontmaps. First, the app needs to be able to choose a font and format text in that font. That latter task requires font metrics information, including kerning and ligature information for Latin fonts, and potentially very sophisticated rules for combining characters in complex scripts. These rules are sometimes embedded in the font, particularly OpenType formats, but more often not for older formats such as the Type1 family. Interestingly, you don't need the glyphs for the formatting step.

The second place where you need the font is to display it on the screen. Historically, these fonts have lived on the X server. But the new way is for the client to manage the font. The XRender extension supports this approach well, as it supports server-side compositing of glyphs supplied by the client. Even without XRender, it makes sense to do the glyph rendering and compositing client-side, and just send it to the X server as an image. Maybe 15 years ago, the performance tradeoff would not have been acceptable, but fortunately CPU power has increased a bit since then.

The Xft library is one possible way to do font rendering, but I'm not very impressed so far. Among other things, it doesn't do subpixel positioning, so rendering quality will resemble Windows 95 rather than OS X.

The third place where you need the font is when you're printing the document. In most cases today, it's a good tradeoff for the app to embed the font in the file you're sending to the printer. That way, you don't have to worry about whether the printer has the font, or has a different, non-matching version. If you do that, then Ghostscript doesn't actually need to rely on fontmaps at all; it just gets the fonts from the file. However, a lot of people don't embed fonts, so in that case, Ghostscript has to fudge.

So how do you install a font into all these fontmaps? Currently, it's a mess. There are various hackish scripts that try to update multiple fontmaps, but nothing systematic.

One way out of the mess would be to have a standard fontmap file (or possibly API), and have all interested apps check that file. Keith Packard's fontconfig package is an attempt to design such a file format, but so far I'm not happy with it. For one, it's not telling me all the information I need to do substitution well (the main motivation in Ghostscript for the CIDFnmap and xlatmap file formats). Another matter of taste is that it's an XML format file, so we'd need to link in an XML parser just to figure out what fonts are installed. I'd really prefer not to have to do this.

I realize that the Right Thing is to provide enough feedback to KeithP so that he can upgrade the file format, and we can happily use it in Ghostscript. But right now, I don't feel up to it. The issues are complex and I barely feel I understand them myself. Also, I'm concerned that even fixing fontconfig for Ghostscript still won't solve the problems for other apps. After all, Ghostscript doesn't really need the font metrics, just the glyphs. Even thoroughly obsolete apps like troff need to get .afm files for Type1 fonts (much less font metrics from TrueType fonts). GUI apps on GNU/Linux haven't really caught up to the sophistication of mid-'80s desktop publishing apps on the Mac. As far as I can tell, fontconfig currently has no good story for metrics or language info.

What would really impress me in a standard for fontmap files is a working patch to get TeX to use the fonts. But perhaps this is an overly ambitious goal.

In any case, I really enjoyed meeting Brian in person today, and commend him for having the courage to attack this huge problem starting from the Mozilla corner.

199 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!