23 Jan 2003 raph   » (Master)

A grab bag of responses today, plus some actual technical content.


Thanks to guerby for his response to my entry on Venezuela. The situation there is clearly very complex, and I absolutely agree that trying to become informed by reading one blog is unwise. It's cool that he's delved into the issues without being partisan to one side or the other; that seems to be rare.

Indeed, one of the great strengths of the blog format is the ability to respond; to provide even more context for readers. The newspaper I get sucks, but there's precious little I can do about it.


I agree with djm that a "don't feed the trolls" policy is probably the wisest. I usually read the recentlog with a threshold of 3, so I don't tend to even notice troll posts unless someone else points to them.

Crowd estimation

jfleck's story about estimating the Rose Parade crowd sounds quite a bit like this one. One clarification: my Market Street numbers are based on a per-person area of four feet by four feet, or 16 square feet. Based on this, I am quite confident that my figure of 80,000 is a lower bound on the total who participated in the march. Now that I have some idea how the police come up with their crowd estimates (basically, guess), I see no reason to prefer their numbers over any others.

The war

I'd like to thank Zaitcev for his thoughtful criticism of my opposition to the war. He's made me think, which is a good thing no matter what you believe.

I agree with his point that Islamic fundamentalism is a powerful and destructive force, especially when used as the justification for dictatorships. Coexistence between the Moslem sector of the world and the West is clearly going to be one of the biggest challenges in the coming decades.

But I think that even if one agrees with the fundamental premise that military action is the best way to respond, there is plenty to criticize in the US administration's war plans. For one, from everything that I see, Iraq isn't the most virulent source of Islamic fundamentalism, not even close (it doesn't even show up on this map). Second, a pre-emptive attack based on no hard evidence, or possibly lack of compliance with UN resolutions is virtually guaranteed to fuel hatred of the US in the Muslim world, not to mention strong anti-American feelings throughout the world. No need to speculate; it's starting now just based on the rhetoric of war, not (yet) thousands of people dying.

Finally, even if the warmongers are dead right, starting a war with potential consequences of this magnitude demands very careful debate and deliberation, at least in a free society.

The Onion's take would be funny if it weren't so darned close to the truth.

Free(?) fonts

Bitstream has announced that they're donating freely redistributable fonts. It's always nice to see more font choices. Now seems to be a good time to remind people, though, that the URW fonts that ship with every Linux distribution were purchased by Artifex and released under GPL license. I'm not sure whether license Bitstream chooses will be Debian-free or now, especially given that they haven't given the text of it yet.

Distributed, web-based trust metric

Inspired by discussions with Kevin Burton, I've been thinking a bit recently about using Web infrastructure to make a distributed trust metric. I think it's reasonable, if suboptimal.

The basic idea is that each node in the trust graph maps to a URL, typically a blog. From that base URL, there are two important files that get served: a list of outedges (most simply, a text file containing URL's, one per line); and a list of ratings. In the blog context specifically, this could be as simple as a 1..10 number and URL for each rating. But, while the outedges are other nodes in the trust graph, the ratings could be anything: blogs, individual postings, books, songs, movies, whatever.

So, assuming that these files are up on the Web, you make the trust metric client crawl the Web, starting from the node belonging to the client (every client is its own seed), then evaluate the trust metric on the subgraph retrieved by crawling. The results are merely an approximation to the trust metric results that you'd get by evaluating the global graph, but the more widely you crawl, the better the approximation gets.

A simple heuristic for deciding which nodes to crawl is to breadth-first search up to a certain distance, say 4 hops away (from what I can tell, this is exactly what Friendster uses to evaluate one's "personal network"). But a little thought reveals a better approach: choose sites that stand to contribute the largest confidence value to the trust metric. This biases towards nodes that might be more distant, but are highly trusted, and against successors of nodes with huge outdegree. Both seem like good moves.

What are the applications? The most obvious is to evaluate "which blogs are worth reading", which is similar to the diary rankings on Advogato. Perhaps more interesting is using it to authenticate backlinks. If B comments on something in A's blog, then A's blog engine goes out and crawls B's blog, determines that B's rating is good enough, then posts the link to B's entry on the publicly rendered version of the page.

I see some significant drawbacks to using the Web infrastructure, largely because of the lack of any really good mechanism for doing change notification. There's a particularly unpleasant tradeoff between bandwidth usage, latency, and size of the horizon. However, with enough spare bandwidth sloshing around it might just work. Further, in the Web Way of doing things, you solve those problems at the lower level. It may well be that this is a more reliable path to a good design than trying to design a monolithic P2P protocol that gets all the engineering details right.

I'm not planning on implementing this crawling/tmetric client any time soon, but would be happy to help out someone else. For one, it would be very, very easy to make Advogato export the relevant graph data.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!