Older blog entries for raph (starting at number 174)

mod_virgule

A long patch of "raph lag" has finally ended, with all of "Gary's branch" being merged into HEAD and applied live. Did you notice?

April fools

I'm really glad I didn't try any April fools pranks with the site this year. Aside from the usual issue of having no time, the obvious jokes (getting bought out by a BigCo) are pretty stale by now.

Two were funny: PigeonRank, and Binary Lexical Octet Ad-hoc Transport. The latter was funny because it had a real point, one well worth paying attention to.

Instant outlining

Dave Winer is rolling out something called Instant Outlining. I think there's something interesting there. I don't think he's doing any interesting new science. The polling/notify protocols look fairly crude compared to some of the more elegant work on such things in distributed systems research. The XML format (OPML) is not particularly elegant, although I must say it is something of a relief to see XML being used to represent primarily text with a tree structure. Cramming all the text into <outline text=""> elements is merely a syntactic nit. The fact that it's relatively simple XML positively invites people to play with it.

What is interesting is that Dave seems to have made propagation of annotated links extremely easy, but still under manual control. Because of the manual control, the trust issues are manageable. If you read the testimonials on Dave's site, you'll comparisons to email, which of course gets its trust all wrong and suffers badly from it.

I don't think Instant Outlining gets everything right. Browsing through some OPML files, I see some dissonance: between the flow of time and the tree, between missives and replies (they don't seem to be formally linked), between freeform text and formal structure, and between freshness and permanence. But at least they're playing with these concepts for real.

I also think that tree-structured outlines are a very useful concept, and underappreciated in today's world. Note how Advogato diary entries have evolved a very outline-like style, with headings tagged by <b>. If the volume of diaries scales up, it might be very useful to formalize that structure, and present a "collapsed view" for at least some diaries, showing the headings but not the bodies.

trust

In response to some inquiries, I wrote up a Attack Resistant Trust Metric Metadata HOWTO. I posted it to p2p-hackers, which seems like the best demographic, but perhaps there are people here who might be interested as well.

thesis

More updates to the latest snapshot of my thesis. There's some good work in there, including considerably more detail on my Google analysis, a distributed network model, and more.

Not only that, but I finally went ahead and implemented the PageRank algorithm. It's pretty cool. I did some testing over the Advogato graph, and the results seemed to be very good, probably even better than Advogato's native trust metric.

This implementation is also fast: about 80ms to run the metric.

rms on patents

Richard Stallman's recent speech on software patents is very good. Most of what's been written on software patents falls either into the "patents bad. Ugh!" or the "we need them to incentivize innovation" camps. In this speech, rms goes quite a bit deeper.

Wiki lust

I want Advogato to have a Wiki. Just a simple little one, nothing super-fancy. I might just have to code one up after a bunch of the merging stuff happens.

29 Mar 2002 (updated 30 Mar 2002 at 08:32 UTC) »
phone

Based on the feedback from my last post, as well as a little digging on my own, I decided to try the Plantronics CT10. I'll let you know how I like it.

Ghostscript

The 7.20 release process is well underway. rillian put together a release candidate tarball. Feel free to try it. So far, there has been a fair amount of feedback on gs-devel, largely of build problems.

The other major user-visible thing that's happening is that we're trying to tighten up security. In particular -dSAFER now blocks reading of arbitrary files, in addition to writing. Unfortunately, this breaks gv on pdf files, because gv writes out a temporary PS file as a wrapper for the PDF.

I'm preparing a patch now for gv, but it's still tricky getting the fixes in the hands of users, largely because the last release of gv appears to be 5 years old.

Security vs convenience is a delicate tradeoff. I'm trying to do the right thing, but I still have the feeling that it will cause difficulty for users.

Trust metrics and Slashdot

I've heard from a couple of people that the Slashdot crew made the assertion at a panel at SXSW that "Advogato doesn't scale". This angers me. The are nontrivial algorithms at the core of Advogato, so it's not obvious that it does scale. I put quite a bit of work into optimizing these algorithms. Lately, I've been playing with the trust metric code itself, as well as the Advogato implementation. There's no question that the website has performance issues, but most of those are due to the simpleminded arrangement of account profiles as XML files in the filesystem. On a cold cache, crawling through all 10k of these XML files takes about a minute. Once that's done, though, the actual computation of the network flow takes somewhere around 60 milliseconds on my laptop (you have to do three of these to run the Advogato trust metric).

So it's clear that Advogato isn't anywhere near its scaling limits. If we needed to scale it a lot bigger, the obvious next step is to move to an efficient representation of the trust graph. bzip2 on graph.dot gets it down to 134k, so there's obviously a lot of room for improvement.

So I'm upset that misinformation went out to a fairly large number of people. In order to spread the word, I hereby challenge Slashdot to a public duel. In return for their analysis of the scaling limits of Advogato and of the underlying trust metric, I promise to do an analysis of why the Slashdot moderation system sucks. I will cover both mathematical analysis, pointing out its failure to be even modestly attack resistance, as well as some discussion on why this leads to lower quality results in practice.

I hope they take me up on this, as I believe such an analysis would be useful and interesting. If not, I hope they have the courtesy to retract their assertion about Advogato's scaling.

Trust metrics and LNUX

Slashdot is not the only LNUX property that has rubbed me the wrong way. The "peer rating" page on SourceForge has, since its inception, contained this text:

The SourceForge Peer Rating system is based on concepts from Advogato. The system has been re-implemented and expanded in a few ways.

The Peer Rating box shows all rating averages (and response levels) for each individual criteria. Due to the math and processing required to do otherwise, these numbers incoporate responses from both "trusted" and "non-trusted" users.

The actual system they implemented is a simple popularity poll. There's nothing wrong with this. I can certainly appreciate that they've been busy with other things, and trust research is obviously not a core focus of SourceForge. But I really wish they wouldn't pretend that it's something it's not.

I came to the realization a couple of weeks ago that this text is a big part of the reason why I feel such Schadenfreude over the woes of LNUX. Keep in mind, I think SourceForge has been a wonderful service, and I think the people who work on it are great (I know several personally). But I think the corporate structure at LNUX has hurt the goals of free software quality and intellectual inquiry. I have nothing against them trying to make a living doing a proprietary PHP candy shell over legacy free software tools, but I don't think their interests intersect much with mine - which is advancing the state of free software by doing high quality work.

Again, if they revised that text, I wouldn't be quite so happy when I see the red next to their symbol on the LWN stocks page.

Trust metrics and the future

I firmly believe that attack-resistant trust metrics based on peer certification have the potential to solve a lot of important problems. Obviously, I'm biased, because they are, after all, the focus of my thesis. It's certainly possible that, in practice, they won't work out nearly as well as my visions.

Even so, even the most critical person would have to admit that there's a possibility that trust metrics can be the basis for robust, spam-free messaging systems, authorization for ad-hoc wireless networks, not to mention the possibility of making metadata in general trustworthy, without which the real-world implementation of the idealistic "semantic web" concept will surely degenerate into yet another tool for spammers and related slimeballs.

"More research is needed." However, I'm not being paid to do this research, so my own contributions are strictly for fun. Further, centralized systems are much more likely to be profitable to corporations than democratic peer systems (cough, VeriSign, cough). Thus, the research is going to be driven by the currently tiny core of academic and free-software researchers that actually understand and appreciate the issues.

I am hopeful that interest will pick up, especially now that I realize that Google's PageRank algorithm is fundamentally an attack-resistant trust metric. I think everybody will concede that Google is waaaaay better than their competition. If attack-resistant trust metrics work so well for evaluating Web search relevance, then why not in other domains?

This is, I believe, a message that is ripe for more widespread dissemination. That's the main reason why I'm posting this entry, rather than nursing my perceived slights in private.

Call your congress-critters

When the DMCA was being proposed, I thought it was a terrible idea, but I didn't let my congresscritters know. It passed. There is some relation between these two facts.

The CBDTPA is, like the DMCA, a terrible idea. Not only does it have the potential to seriously impact the legal development of free software, there is also no Village People song to which the acronym can be sung. Thus, this time I decided I would at least do something, so I heeded this EFF alert and gave my two Senators and one Representative a call. It was a surprisingly easy and pleasant process. In all three cases, a staffer picked up the phone, and seemed genuinely eager to listen to concerns from their constituents. Not only that, they seemed at least moderately familiar with the issue, indicating that it's been getting some response.

You can make the response even bigger. Do it.

27 Mar 2002 (updated 27 Mar 2002 at 06:43 UTC) »
Advice wanted

It took me until last week to come to the realization that I spend a nontrivial fraction of my work time on the phone, in conference calls, but that my phones are pretty bad.

Thus, I have decided that I want a really good phone. Sound quality is paramount, but comfort is also significant. One friend told me I should get a phone that supports Plantronics headsets. Anybody have experience with these?

spam

Even though I have spamassassin installed, spam seems to be getting worse. This essay (or here) suggests that I'm not the only one. The essay also makes an important point: the spam itself may ultimately not be as harmful as the steps taken to try to fight it.

There is no question about it; our email infrastructure is rotting. It seems like the bad guys are being creative in how to destroy it, but there is nobody actively working on improving it.

Even if email continues to be useful for some time, it is very much evolving in flavor. I don't much like what it's evolving into.

Lions book, C

I bought the Lions book on impulse some time last week, and it arrived today. So far, I've only skimmed it, but it looks like a serious treat.

One of the fun things about the book is that the version of Unix described is written in a rather old dialect of C - for example, the assignment form of addition is written =+.

A most striking feature of the language is that types are optional. This works largely because of the nature of the machine - there's only one interesting word size other than "char". The result is a rather more concise flavor than the C of today.

The idea of optional types has fallen out of favor. Languages today tend to either prohibit them (eg Python) or require them (eg Java). I'm not at all convinced that this is a good thing.

One of the joys of C is its maturity - it's been around long enough to have its rough edges sanded off, and also to attract some reasonably decent implementations and tools (though I believe that quite a bit of useful work remains to be done). It's also reasonably stable by now. I think this is very important - it's very difficult to achieve quality if you're constantly throwing good things away and writing new things that suck. Unfortunately, most of the trendier languages are unstable to an extreme. If I write a large system in a Python/C mix now, using libraries extensively, it's almost certain that the preferred Python implementation five years hence won't be able to run it without modification, and the libraries will no doubt have been superseded as well (especially if one of the libraries happens to be Tkinter).

That said, C is not yet completely static. I do not its current evolution is in very good hands, though. There appear to be four bodies that are actively working with C: the standards committee, Microsoft, gcc, and Apple. The latter two are actually based on the same code base, but seem to be steered in different directions.

These guys are not working together very well. C99 has yet to actually be implemented. A lot of the stuff in the spec is needless complexity - the "restrict" keyword certainly comes to mind, not to mention both digraphs and trigraphs. Then you have the "extern inline" fiasco, which virtually guarantees that a potentially useful feature is rendered useless for years to come. Then, you have really good ideas like standard sized integer types, that can't easily be used because they are incompatible with pre-C99 versions.

I think that if the C99 committee had been doing a good job, they could have fixed that last problem without much difficulty. Basically, they could have developed and released (under a very unrestrictive free license) a set of header files that reliably add these new types to the most popular C implementations of the day. New C implementations would of course include them "out of the box". That way, everybody would get to use them, and they could catch on quickly.

Then you have the useful features that they missed. One of my favorites is the ability to call a varargs function dynamically (vararg functions have had the ability to be called dynamically since at least the days of the Lions book). A special case (va_list) is already implemented. I don't think the more general case would have been that difficult or complex, and it would make life much easier for people trying to do dynamic language binding.

Ah well.

mod_virgule

After discussing two instances of good things that are being poorly maintained, it is uplifting to see renewed vitality on mod_virgule.

My only question: if we implement CSS, and then it turns out that there are a lot of people who don't like it, will we have to DeCSS the site?

Btw, one idea that might be worth considering is to use a tiny bit of text to indicate cert level. This would not only distinguish certs on CSS-losing browsers, but also help with screen readers, tiny wireless devices, and other less popular web browsing technologies.

out of the box

It took me a while to figure out exactly what kgb has against the phrase "out of the box." Then it hit me - I was thinking about user experience, as in "it works well out of the box." For example, Ghostscript doesn't work all that well out of the box. It works much better if you know all the fiddly little command line things. I think we should all do more "out of the box" thinking.

schooling out of the box

Both of our kids are obviously highly gifted, which is forcing us to think about their education. A lot of people seem to have the attitude that the traditional school environment builds character, even if it's deathly boring for gifted children. Somehow, it's supposed to prepare the child for later life.

I just flat-out don't agree. Realistically, the chances of Alan becoming a corporate drone are much lower than, say, making a living writing poetry. Why not prepare him for the latter, then?

Our current thinking is to have him in school half a day, and at home half a day. Now that he's reading, I think he can take more of his education into his own hands, just like both his parents.

Python

Playing with Python continues to be great fun. Check this out:

>>> import fitzpy
>>> rp = fitzpy.read_pdf_open('/home/raph/knight.pdf')
>>> t = rp.get_trailer()
>>> t['Root']['Pages']['Kids'].topy()[0]['MediaBox'].topy(1)
[0, 0, 612, 792]

I can't imagine it getting much better than this.

Alas, speed is still a problem. It's not horrible, mind you, but even with C doing all the work of opening the file and parsing tokens, it's still quite a bit slower than a C-only implementation.

xmlrpc

The xmlrpc stuff is fun. I'm happy that gary is working on it. The idea of the community taking care of Advogato, making it thrive, thrills me.

People who are interested in mod_virgule development should look at the virgule-dev mailing list. The Badvogato crew has done some good things with the code, and it's just my slackerliness that's kept it from being integrated. I'm hoping that will change, now that Gary has so generously volunteered to take some of the load.

I'm posting this diary entry from the Python command line, using Gary Benson's xmlrpc patches. For more information, see xmlrpc.html. Looks like a lot of fun!

I spent most of the evening reading "Stupid White Men", by Michael Moore. It's a good book.

language design

I'm not sure why I've been thinking so much about language design. Paul Graham says a lot of interesting things that challenge the conventional wisdom. I guess that's the answer -- having read through his essays a couple of weeks ago, they have provoked thought.

One of the things that Paul Graham says is that libraries are really important. Most "language designers", especially academics, fail to take this into account. Graham raises the possibility that, in the future, libraries will be somewhat independent of the language. Currently, the choice of language and libraries is very closely bound together.

Of course, there are already a bunch of things that move us toward this goal. One such is CORBA, which most people know as a tool for making crappy network protocols. The goal of CORBA is noble: to allow systems to be built from components in lots of different languages with a magic tool called an ORB. In-process, the main function of an ORB is to translate between different object protocols.

However, in actual implementation, CORBA has earned a reputation for being painful, bloated, and inefficient. ORBs themselves tend to grow very complicated, for reasons that are still not entirely clear to me. I think a lot of the problems have to do with people trying to use CORBA to build crappy network protocols. That's a much harder problem, and impossible to do right within the CORBA framework, so people keep piling on more layers in the attempt.

Another very interesting thing is SWIG, which has basically the same goals as CORBA, but without the crappy network protocol part, and with a focus on dynamic languages rather than C++. Quite a few people use SWIG, apparently, but I am yet to be won over. I think my uneasiness rests with the fact that SWIG tries to make language bindings easy. I don't really care about that. What I do care about is making language bindings that are really good.

I find myself liking the Python object protocol (which is the central part of the Python/C API). It's not a tool for creating N^2 adapters between N object protocols of different languages. It's just a single object protocol. What makes it interesting is that it's a particularly good object protocol. It's not hideously complicated. It's not especially painful to write for it, although trying to do objects in C always seems to result in a lot of typing. It's reasonably lightweight - the object header overhead is typically 8 bytes. It seems to be quite powerful - just about everything you'd want to express in Python can be done in the object protocol. In fact, that's pretty much the way Python is implemented. A special delight is that it's debuggable, using standard tools like gdb. This is something that very few dynamic languages get right.

In short, it's everything an object protocol should be, except for one thing: it's pretty darned slow. For many applications, this simply doesn't matter. Either you don't care about execution speed at all, or you do but the object protocol overhead isn't a major factor, probably because you're using the library to do the "real work" with a relatively coarse grain. However, if you want to use the protocol for fine-grained object invocations, performance will suffer.

It's not at all clear to me how to fix this. Trying to optimize the thing for speed is hard, and will certainly increase complexity. For one, you start needing a static type system to be able to say things like "this is an unboxed integer". Python (and its object protocol) is very purely dynamically typed, and this is a big part of what makes it so simple. Maybe the answer is to accept that the object protocol itself is slow, and just keep making more and better libraries, so that you amortize the overhead of the object protocol over larger and larger grains of computation.

I don't know how practical it is to use the Python object protocol without the Python language. The two are pretty tightly coupled, and for good reason. But it's an interesting concept.

One of Paul Graham's quotes (from this) is that "Python is a watered-down Lisp with infix syntax and no macros". I see what he's trying to say, but also feel that he's missing something important. I think the Python object protocol is a big part of what he's missing. Lisp was never good at interfacing with other languages, in particular C, and the result is that has sucky libraries, especially for things like I/O. Python fosters high quality wrappers for C code, which basically means that the Python gets to participate actively in the world of C-language libraries. That, I think, is pretty damned cool, and more important than most people think.

Advogato

I finally got off my ass and did some very basic maintenance on Advogato. In particular, the intermittent sluggish performance should be fixed now. Also thanks to Gary Benson for a memory leak patch - I'll be applying his XML-RPC patch as soon as I've had a chance to review it, which should be Wednesday.

I'm going to put aside a tiny but steady amount of time for Advogato improvements. This means, of course, that I'll need to prioritize the things on my wishlist. DV's suggestion to make interdiary links bidirectional seems really nice - it sounds like it will add considerable richness without disrupting the existing structure.

Another change I'd like to make soonish is to render real names in most contexts. The nicknames are cute, but I think they don't scale well. Of course, this kind of change is not rocket science, but those things are important too.

I want to do some kind of "rooms" thing, to make it less intimidating to post articles. Badvogato does this already. I'm thinking rooms for news, entertainment-type things (books, movies, etc), and so on. It could work nicely with custom views.

On the rocket science front, I think one of the most interesting things to do would be to run a principal eigenvector-based trust metric over the data, in addition to the current network flows. The attack resistance would be about the same, but the result would be a real-valued ranking rather than the current boolean yes/no (hacked up to be four-valued by repeating the runs). The main advantage is that these rankings would be deterministic and stable, which would solve one of the big user complaints about Advogato's trust metric. People don't like it when their color suddenly fades for no apparent reason.

On the flip side, it would be quite fascinating to run the Advogato trust metric on the Google data. This project would, I believe, make an excellent submission to the Google programming contest. The API to the trust metric engine is actually quite easy to understand. I think this is a doable project for even with modest programming skill. I'm also quite willing to release tmetric.c under terms compatible with the contest. Hint, hint.

All of my trust metric ideas are public domain. Obviously, this is not the case for a lot of the work in this field. In particular, I wouldn't be surprised if Google felt that principal eigenvector-based trust metrics would infringe their patent. Even if so, it's likely that the research exemption would apply for Advogato itself.

In any case, it's a moot question for now, because I really don't have time to code any of this stuff up.

More on trust metrics

I had a great discussion with Roger Dingledine tonight. Among other things, we were talking about my design for an attack-resistant peer-to-peer network infrastructure. My "stamp trading" idea buys you attack resistance in the sense of being able to reject spam email, but unless there's a good algorithm for setting exchange rates, it doesn't prevent denial of service attacks. Roger has been doing a lot of thinking about reputation, and there are quite a few interesting parallels between my design and his ideas for evaluating reputation in remailer networks (see the FreeHaven papers page for links to his work).

I have some fuzzy ideas about how to set exchange rates for stamps, but so far no hard analysis. Perhaps the most exciting thing about my recent breakthrough in analyzing PageRank is that I now have two powerful tools for analyzing attack-resistant systems: network flows (which I have had for some years), and random walks (which I have only had for a couple of weeks). I am very excited that random walks may be just the tool I need to crack the stamp exchange rate nut.

Kids

Alan turned six yesterday (actually two days ago, as I'm posting this after midnight). As we expected, now that he's got the motivation to read on his own, he's making incredible progress - he can read sentences containing words like "librarian" fluently now. My guess is that he will transition from learning to read to just plain reading within another couple of months.

He's very interested in the concept of infinity, and the idea that infinity plus one is still infinity. I made the mistake of brining up the fact that there are in fact different infinities, aleph-null being countable, and the uncountable ones being bigger. He insisted that I explain this to him (including a very rough outline of the Cantor diagonalization argument). When I was done, he told me, "I didn't realize numbers could be so boring until now". But I don't think I've soured him for life :)

Max is also developing very rapidly. He's talking up a storm now, and is gaining more and more grammatical concepts. We just noticed that the singular/plural distinction is now very reliable. For the most part, he's still not doing full subject-verb-object sentences, but he still makes himself pretty well understood. A couple of weeks ago, when I got home, he greeted me with "help -- puter -- stuck". :)

5 Mar 2002 (updated 5 Mar 2002 at 07:16 UTC) »
thesis

Thanks to everyone who wrote - my last entry got a very satisfying number of responses!

While I do find Scientology fascinating, the main reason I'm interested in it now is that they have a very strong track record of using media in innovative ways to further their goals. Thus, if a metadata system claims to be "attack-resistant", then its ability to deliver both pro- and anti-Scientology link is a very good test of that claim.

Of course, it's hard to evaluate a search engine based on "scientology" results alone. For one, it's likely that the operators of the search engine will either bias the results the way they feel about Scientology, or to counteract a bias they percieve. Different search engines will approach this differently. Altavista, which is basically a pure keyword engine, reports about 48 pro links before you get to the first anti. MSN's search does quite well: 4 of the top ten links (5-8) are high-quality anti sites. Based on various innuendo I had heard, I expected Earthlink to do fairly badly. However, they just rebrand Google, so the results are identical.

My attack resistance result on PageRank doesn't say anything about the rank (more familiarily known as "googlejuice") of a particular page. Rather, it bounds the total googlejuice that can be captured by a determined attacker. I haven't figured out the implications of this yet.

I didn't get as much time as I would have liked today to write - there are always other things that come up. I'm trying my best to ignore my email and tell everyone else to bugger off, but it's not easy.

One mystery I haven't been able to resolve: why is it that anti-Scientology websites have such atrocious HTML layout?

speaking of search engines...

...I notice that they're becoming quite a bit richer in the document languages they search. It used to be just HTML, now all good search engines seem to be able to handle a dozen or two of the most popular formats. I am, of course, professionally interested in their PostScript interpretation capabilities. That link itself isn't all that interesting. What will be more so is to see how various search engines handle it.

<mischievous grin/>

Nearly Headless Nick

Nick is the recycled old laptop now functioning as an 802.11b access point in my studio. By now, I feel I would have been far better off buying an Airport, or one of the Linksys or D-Link jobbies. However, it was kind of fun to get 802.11b running. In the past, it's been kind of flaky, especially in Master/Managed mode, so I just ran it in Ad-hoc. But after an apt-get upgrade toasted the PCMCIA on the system, I upgraded the kernel and & lt; a href="http://people.ssh.com/jkm/Prism2/">prism2</ a> driver to their latest stable versions, I find that it generally works quite well.

One trick was to change the cardmgr options to "-f", so that the init scripts would wait for the cards to initialize before starting named and dhcpd. Otherwise, they'd start up without the IP address being configured, which obviously wasn't happy.

I played around with the power-saving modes, but couldn't detect an actual effect on battery life. The card (a DWL-650) seems to run pretty cool, and I expect that even at full power, it sucks down a lot less juice than a 900MHz P3, big nice LCD, and hard drive. Selecting power modes did cause interesting log messages on the AP, including what looked like a reference count mismatch.

Nick is also a handy backup nameserver (after having gotten burned a few times, I now pride myself on running the one of the best amateur DNS services around). At some point, if I can find a worthwhile P2P network that can run with minimal resources, I'll host that on there as well. New hardware is amazing, but old can be fun too.

thesis

I had a major breakthrough over the weekend. Inspired in part by the recent article about Scientology taking over all the top Google spots, I went back to my metadata chapter, which includes an analysis of the attack-resistance of Google's PageRank algorithm.

I now feel that I understand PageRank much more deeply. For one, I have a proof outline of its attack-resistance, which, as it turns out, is rather similar to Advogato's. There are substantial differences, though - one of the very nice features of PageRank is that it's deterministic and stable (ie, small changes to the graph cause small changes to the resulting rankings).

So now we have two known attack resistant trust metrics. One is based on network flow, another on principal eigenvalues - both highly classical algorithms, both reasonably tractable and scalable. This is intellectually a deeply satisfying result.

Based on my analysis, I'm able to provide reasonable answers for the following questions:

  • How did Scientology succeed in subverting the PageRank algorithm?

  • Why did registering a large number of domain names help them so much?

  • What exactly does "attack-resistance" mean in the context of Google?

  • How can PageRank be manually fine-tuned (with fairly minimal effort) to be even more attack-resistant?

  • What is the justification for the ||E(u)||_1 = 0.15 "voodoo constant" in the PageRank paper?

For the answers to these questions, you'll have to read the metadata chapter of my thesis. It's not quite written yet. A large part of the reason I'm posting this is to fish for requests to get that chapter written. So even if you find the above questions deathly boring, go ahead and send me an email feigning interest.

Python and Lisp

Someone else posted a link to Norvig's page comparing Python and Lisp. This is an excellent page, and really highlights how similar the two languages are, aside from syntax. This isn't all that surprising, as I've written some very Lisp-flavored programs in Perl in my day (here's one example from my thesis).

As I posted before, one of the turnoffs for Python for me is the really poor speed showing of the current implementation. What makes this even more galling is the fact that we've had Lisp compilers for some time now that are within striking distance of C, speed-wise. Hell, I even wrote one myself, about 18 years ago. So why is it that we still don't have an implementation of Python anywhere nearly as good, in this respect, as ancient implementations of an ancient language?

Another person who has a lot to say is Paul Graham. I linked his taste article a couple of diaries ago. I don't agree with everything he says, but it's all interesting.

One of the things I do agree with is his assertion that libraries are critically important (see section 6 of popular.html). If you believe this, then one of the best tests for the vitality of a programming language is the availability of libraries for that language. Python has one of the more interesting stories around today, in large part because it's relatively easy and clean to hook in C code. I think the fact that distutils can be used to cleanly package mixed Python and C is more important than most people give it credit for - in most other languages, mixing creates nice headaches in the build/package/distribute department.

If arc gets this in a deep way, and also gets a good implementation early on, then it will be an interesting language. I look forward to seeing how that goes.

165 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!