I had a major breakthrough over the weekend. Inspired in part by the recent article about Scientology taking over all the top Google spots, I went back to my metadata chapter, which includes an analysis of the attack-resistance of Google's PageRank algorithm.
I now feel that I understand PageRank much more deeply. For one, I have a proof outline of its attack-resistance, which, as it turns out, is rather similar to Advogato's. There are substantial differences, though - one of the very nice features of PageRank is that it's deterministic and stable (ie, small changes to the graph cause small changes to the resulting rankings).
So now we have two known attack resistant trust metrics. One is based on network flow, another on principal eigenvalues - both highly classical algorithms, both reasonably tractable and scalable. This is intellectually a deeply satisfying result.
Based on my analysis, I'm able to provide reasonable answers for the following questions:
- How did Scientology succeed in subverting the PageRank
algorithm?
- Why did registering a large number of domain names help
them so much?
- What exactly does "attack-resistance" mean in the
context of Google?
- How can PageRank be manually fine-tuned (with fairly
minimal effort) to be even more attack-resistant?
- What is the justification for the ||E(u)||_1 = 0.15
"voodoo constant" in the PageRank paper?
For the answers to these questions, you'll have to read the metadata chapter of my thesis. It's not quite written yet. A large part of the reason I'm posting this is to fish for requests to get that chapter written. So even if you find the above questions deathly boring, go ahead and send me an email feigning interest.
Python and Lisp
Someone else posted a link to Norvig's page comparing Python and Lisp. This is an excellent page, and really highlights how similar the two languages are, aside from syntax. This isn't all that surprising, as I've written some very Lisp-flavored programs in Perl in my day (here's one example from my thesis).
As I posted before, one of the turnoffs for Python for me is the really poor speed showing of the current implementation. What makes this even more galling is the fact that we've had Lisp compilers for some time now that are within striking distance of C, speed-wise. Hell, I even wrote one myself, about 18 years ago. So why is it that we still don't have an implementation of Python anywhere nearly as good, in this respect, as ancient implementations of an ancient language?
Another person who has a lot to say is Paul Graham. I linked his taste article a couple of diaries ago. I don't agree with everything he says, but it's all interesting.
One of the things I do agree with is his assertion that libraries are critically important (see section 6 of popular.html). If you believe this, then one of the best tests for the vitality of a programming language is the availability of libraries for that language. Python has one of the more interesting stories around today, in large part because it's relatively easy and clean to hook in C code. I think the fact that distutils can be used to cleanly package mixed Python and C is more important than most people give it credit for - in most other languages, mixing creates nice headaches in the build/package/distribute department.
If arc gets this in a deep way, and also gets a good implementation early on, then it will be an interesting language. I look forward to seeing how that goes.
