15 Jun 2002 raph   » (Master)

Advogato

I finally got around to fixing the locks so that lock contention won't cause huge delays in reading pages. Writing (updating diaries and the like) can still be affected, but this is less urgent to fix.

LotR wrote:

We need a diary-writing trust metric!

Okay. I think you're right. I might well be motivated to write a generic metadata engine and apply it to the specific application of "how interesting is diary X?". Here's roughly how it will work.

When you're logged in, you'll get a chance to enter a one-to-ten score for another user's diary page. I might put this right under the "Certify <user> as:" selection at the bottom of individual person pages, but I'm also inclined to make it more accessible, for example allowing bulk updates on a customized version of the recentlog page.

This goes into the database as generalized assertions. At first, the only assertions that will be allowed are of the form "<user>'s diary is 7 on a one-to-ten scale", but the engine doesn't care what kind of assertions are present. "Roquefort is a particularly fine cheese" is also plausible. The reason for limiting the assertion space is to avoid scaling problems, which can become quite severe as the number of assertions scales up.

Then, roughly nightly, there will be a process that computes metadata scores, using the method I presented in my HOWTO. This will compute a confidence value for each user in the trust graph and each assertion. You can see where the scaling problems come from. I am sure there exist techniques for storing this data more sparsely, but I'm not interested in doing that research now.

Finally, the recentlog display will be annotated with the metadata scores. I'll probably also put in an threshold option.

bytesplit

I am trying my best to be patient with bytesplit. I realize he is a human being like all of us, but for whatever reason driven by demons causing him to antagonize people here. I sincerely wish that he is able to tame these demons, and interact positively with Advogato.

At the same time, I realize this is unlikely. As such, bytesplit is providing an opportunity to look at the trust metrics and the dynamics of this site more critically. The current trust metric certainly has limitations, and is definitely not a magic bullet for making this site an interesting read and a comfortable place. That's up to us.

What the trust metric does do is automatically compute membership in the community based on peer certifications. While I personally feel that bytesplit's contributions to free software are marginal at best, ten people here feel that his level of interest is high enough to rate an Apprentice cert. And, he does show interest in learning more, and his on-topic writings are perfectly reasonable for an aspiring apprentice. Given that, I don't think the trust metric should reject bytesplit's ranking.

All this is good motivation to implement the generalized metadata as proposed above. Unlike the existing trust metric, this metadata system would directly address quality and relevance of writing. I'll be very interested to see how it goes.

Cert inflation

We definitely have cert inflation here. Part of that is because the trust metric is generous, part of it is that people here are generally doing an inaccurate job of evaluating peer cert levels. This is useful information for people trying to design metadata systems: a significant fraction of the information input will simply be wrong.

I could certainly make the trust metric less generous. The easiest way to do this would be to have negative certifications as well as positive ones. But I'm not convinced that cert inflation is the most important problem in the world to solve.

Asynchrony

David McCusker called again, and we had another nice chat, this time focussing on writing programs in asynchronous style. I think it's a hard problem. I think it's even worse for library writers, because it may not be realistic to assume that most users of your library will understand asynchronous programming very well. I told David of X as a cautionary tale. X actually has very sophisticated logic for dealing with asynchrony properly. For newcomers to X, this all seems very intimidating and complex (asynchronous grabs are a good case in point). In fact, I think there is widespread failure in levels above X to deal with race conditions and the like correctly.

Every time you do something over the network, it's asynchronous whether you like it or not. Yet, event-driven programs seem a lot more complex than their simple, synchronous cousins. David would like to recapture that simplicity in asynchronous programs. A lot of other people have tried things in this direction, without very happy results so far. I feel that CORBA is a cautionary tale in this regard. It pretends that method calls are really local, when in reality they're decomposed into two asynchronous events, and of course all kinds of things can happen in the meantime.

I haven't seen any of the details of Mithril yet, but I'm fairly skeptical that it will make asynchronous programming accessible to less-skilled programmers. On the other hand, I am perfectly willing to believe that it will be a good tool for expressing asynchrony concisely, and thus useful for people who know what they're doing.

One detail we touched on but didn't really go into was whether the fundamental message sending operation on channels should be synchronous (as in CSP) or asynchronous. In CSP, if you send a message on a channel, but there is nobody ready to readon the channel, you block. The other way to do it is to append the message to a queue. Both are reasonable primitives, in that it's quite straightforward to simulate one in terms of the other. So which do you choose?

I mentioned that the CSP way might be easier to reason about. There's another issue that came to mind after our call: the queue required for the fully asynchronous case requires unbounded resources in the general case. Obviously, in tiny embedded systems, this can be a real problem. On desktops, it's less clear. But if a system is operating under very high load, you probably want to worry about whether the queues will keep growing. Of course you can always implement flow control on top of async messages, but that's not really the point. On CSP, the default is not to grow unboundedly.

mwh: I haven't been following Stackless Python closely, but I am aware of it. Looking briefly at the site, I see they are now implementing a concurrency and channel approach directly inspired by Limbo and CSP. That could be very cool.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!