Older blog entries for Bram (starting at number 109)

Cayley Graphs, Permutation Puzzles, and Archimedean Solids

For many years I had an idea for an approach to solving rubik's cube. Just recently I found a trivial expository example of my approach. Jaap of course took my idea and ran with it and now has a gallery of quite remarkable correspondences between simple permutation puzzles and archimedean solids. (My contribution was the cuboctahedron.)

Gambling pools with odds

Racetrack betting has a very curious feature: When placing your bet, you don't know what the actual odds are. The current odds are readily available, but those in principle could completely change before the close of betting. An interesting variant would be to allow people to place bets on a horse which were contingent on the odds on that horse being at least a certain value. I haven't worked out whether it would always be possible to set canonical odds for a given set of bets, but I suspect it is. This would be an interesting (and, of interest to casinos, potentially quite popular) variant on normal betting pools.

Memory Allocation

graydon has some commentary on memory allocation. Specifically, he's in favor of explicit 'might be null' status for variable, and strictly tree-structured memory allocation. These are good ideas, but there are many thorny edge cases. For example, with non-nullable variables there's the case of an object's constructor passing 'this' to something which reads a variable which hasn't been set yet, even though it's guaranteed to be set once the constructor finishes. For memory allocation, usually that approach works fine, but there are occasional data structures which require making an exception to work properly, such as a circularly linked list. No single allocation scheme can capture the full generality of practical behavior, so the options are to either make a system which handles the vast majority of cases well (which strict tree-based doesn't quite) or have a system for violating the general rule in special cases.

I've spent quite a bit of time thinking about how to make a type-safe language with the declarative power of C++ (and more) but with the transparency of C. A friend of mine told me that one of my ideas amounts to 'jump tables but no vtables' - If you subclass an object, rather than making the class data structure point to a superclass object, it simply inlines a copy of references to all the methods it inherits from the parent, with overriden methods substituted. This vastly reduces the complexity of method lookups at runtime, both in terms of cycles taken and difficulty of proving correctness. The tradeoff is that in theory there could be an exponential blowup in the size of class objects in memory, but in practice that won't happen. (Well, maybe it will happen for those truly obsessed with OO wankery, but they get what they deserve.)

Advogato Feedback

I get very little feedback from this weblog. I know people read it because every once in a while I post an entry which generates lots of response (for example, the first one on solar thermal energy). Maybe this is because my entries are so obtuse that people aren't sure what to make of them. Or perhaps it's because I've made my email address so difficult to find. I had to do that, unfortunately, due to a deluge of not only spam but non-spam BitTorrent tech support requests. I simply couldn't keep up with them, and now make the assumption that anyone who I really want to hear from will make the effort to find my email address. I hope I don't lose too much important mail this way.

Spam Filtering

Speaking of spam filtering, I've now basically gotten my problem under control with the following filters - anything with 'dsl' or 'cable' in a Received: line but not in the From: goes to /dev/null. Anything with 'unknown' in the Received: line gets chucked. Anything with a locally tacked-on Message-Id: which isn't From: a locally hosted domain goes to /dev/null. And finally, anything which matches any of a very long list of bounce formats or bounce subject lines gets chucked.

I've taken this aggressive approach to spam filtering because even small amounts of spam were making me delete real mail thinking they were spam. Unfortunately people with even slightly dubious mail setups are likely to have their mail to me /dev/nulled now. And of course I never get bounces any more. I hope I don't lose too much important mail this way, but I undoubtedly lose some.

Gambling on Wall Street

The Springfield daily newspaper, the Daily Star, reported that their local baseball team was slumping, and had missed the spread five games in a row (which was true). Richard Smallbrain, CEO of the Company of America, wrote a scathing op-ed piece criticizing the paper for tainting its sports coverage with gambling. Smallbrain then also mentioned that his company is a a growing part of Springfield's economy and has beat wall street expectations three quarters in a row.

How to make money off of terrorism

The traditional spy novel scenario for terrorists profiting from their work is quite dubious. Usually the scenario is that the terrorist shorts the stock of a public company before attacking it. This is a bad strategy on a number of fronts. To begin with, shorting requires that you place stock or cash in reserve, thus limiting the amount you could possibly make to a small fraction of the amount you can front. Further, the potential upside of a short is capped at the amount you get up front, thus limiting how much you can make that way even more.

Those problems can be mitigated by using puts instead of shorts, but a much more fundamental problem remains. Attacks frequently fail, and even if they do succeed they often do a lot less damage than you expected. The rebound of a stock after it shoots down can leave it higher than it started. Plus, to reliably reduce the value of a public company requires doing a whole lot of damage, and some people have moral objections to doing that.

There is a much more reliable market-neutral way to make money off of terrorist attacks. First, go long and get puts on a company's stock, with strike price at the current value and maturation times for both on the same day as far in the future as possible. Now you will be out some money and have an interesting bet going where the more the company's stock moves, in either direction, the more money you make. (A nice benefit to this approach is that options with a strike price of the current value tend to be much more liquid than the more extreme ones). If you were to hold onto these options until maturity, this would be a very risky bet, but you aren't going to do that - if you unload them in just a day or two, the bulk of change in the value of your options is going to be in the volatility level of the stock, so even if the stock price is back where it started, if the volatility has gone up you'll make a substantial profit.

Next, set off a smoke bomb in the company's main plant. If that isn't your style you can just give a few ben franklins to your friend who's about to appear on CNBC to get him to say that there are rumors that the company is about to be acquired. In a day or two everyone will probably have figured out that whatever mischief you started was no big deal, so the stock will likely go back to where it started, but in the interim volatility will have gone through the roof, so you can unload the stock with a fat profit.

If you can't be bothered setting off smoke bombs, or happen to not want to do anything illegal, there's a perfectly legal technique based on this approach which makes money due to an artifact in the way that volatility is calculated. In general volatility is calculated by averaging it over the last N days. You can search for stocks which just so happen to have had a lot more volatility over the last N/2 days than the preceding N/2 days, then long and put them, wait N/2 days, and unload your positions. Since the first half of the range will now have passed, and the volatility level in the just passed N/2 days is likely to have been between the two values, the volatility measurement will now be higher, and you can probably unload your position at a profit.

The Recursive Universe

ANKOS follows a remarkably similar format to The Recursive Universe, a much older, and much better book. The Recursive Universe is much more rigorous, uses Conway's life, and doesn't have all that pompous bullshit which ANKOS is notable for.

A funny tangent - another book by the same author, William Poundstone, gives the formula for KFC's secret spices: salt, pepper, and MSG.

The automata I mentioned in my last entry is rule 54, described on pages 696 and 697 of ANKOS. The simple-looking diagrams on page 697 are deceptive - that apparent simlicity only happens on a large scale, there's a whole lot of complicated churning which goes on before things settle down.

Codeville

Codeville has been progressing rapidly, and is self-hosting and able to talk to remote repositories.

The Axioms of Choice and Foundation

I have a question - if the axiom of choice is such a good idea, what's wrong with the negation of the axiom of foundation? Don't they both state the existence of absurd things? Aren't they both very far removed from practical mathematics? ZF is intuitively bothersome this way - it allows for some unreasonably powerful statements, and has axioms to undermine those statements enough to make it reasonable.

A lot of the pro-AC propoganda people are taught in undergrad school involves things which can be straightforwardly constructed without AC, hence the seeming intuitiveness of the common statement of AC - it strongly intuitively implies a bunch of things about the sets involved which would make AC not necessary. Things which need AC tend to be along the lines of the Banache-Tarski, which is completely absurd.

Really Big Numbers

Scott Aaronson sent me mail a while ago, which promptly got lost in the disaster area known as my inbox. I just found it, so I'll comment on it now.

Who can name the biggest number? is, as I commented earlier, an interesting game. Large numbers loosely correlate with busy beaver machines which can use oracles corresponding to large infinite numbers. The problem with this is that we don't, and can't know which large cardinal axioms are consistent, especially when we're trying to be aggressive about naming the largest possible one.

The next step, at least to my mind, is to talk about 'the largest consistent large cardinal axiom which can be expressed with less than a million symbols'. This is analagous to, but more well-defined than, Aaronson's 'The biggest whole number nameable with 1,000 characters of English text', which suffers from a self-referential Russel-style paradox. Numbers named using this technique have an astounding level of inaccessibility, almost like we said 'The largest number which can be imagined by a human mind plus one'. We can contemplate the existence of such a number, but by definition we cannot imagine it.

I sure like picking on Wolfram

Scott also reviews ANKOS and points out a number of outright errors in the book. In particular Wolfram's comments about rule 110 being 'NP-complete' are unsubstantiated, since the construct used involves an exponential number of steps, and hence computing it isn't in P.

However, it is true that we now know that rule 110 isn't computable, once you deal with the encoding issues of computing starting at what state and what you want to know if the output finally does.

That said, my earlier comment about there being computable but nontrivial rulesets analogous to Presburger arithmetic wasn't flippant. I've spent a fair amount of time playing with Wolfram's rule sets and a definite category which he never mentions emerges. Many of the rules produce very messing looking output on random input, but any vaguely simple input results in obvious non-repeating but still simple patterns. In ANKOS Wolfram himself talks about one of the most complex-looking ones of these and says that he ran two million simple inputs and they all resulted in straightforward patterns. Somehow he completely avoids even speculating that this rule is less strong that fully general computation. Maybe he's in denial.

I'd very much like to see it proven that many of Wolfram's automata are computable. Partially because any such proof would be very deep and interesting, but also because I don't like Wolfram's attitude and would very much like to see his central thesis thoroughly debunked.

Public Key Cryptography

This paper gives a public key cryptosystem very similar to the one I came up with, and they manage to prove some solid results about the difficulty of their primitive, while I merely conjecture at the difficulty of mine.

I'm quite happy to have come up with a public key cryptosystem, despite it being comically impractical. Almost everyone can come up with a symmetric cipher which they themselves can't break, but designing any public key encryption algorithm at all which can't be broken trivially is quite difficult.

How to make money at gambling

Some gambling games, such as craps, are simply you the gambler versus the house. These games aren't really 'gambling' so much as they are simply handing money to the casino. Games such as racetrack betting are fundamentally different, in that you're competing against other gamblers with the house merely taking a cut, so in principle it's possible to win money by out-betting your competitors.

The most fundamental principle of serious sports betting is to always bet on the underdog. Intuition indicates the exact opposite, since you want to pick the likely winner, but the goal of sports betting isn't to reliably pick the winners, it's to pick the competitors whose winning changes are underestimated by the other gamblers.

There are several reasons for this. Most obvious is the human instinct to pick the single most likely winner. Also brand name creates bias, because in the absence of analysis people are much more likely to bet on a competitor they've heard of.

A much more insidious phenomenon biases even statistical methods towards the favorite. Given a field of equal competitors, after a few competitions some of them will do better by dumb luck. It is therefore necessary to attribute at least some success among the competitors with good reconds to good luck and some of the failure among competitors with bad records to bad luck, but this is difficult to do. Even worse, frequently competitors which do poorly in previous competitions don't even compete in the later ones, so the size of the pool of competitors which the lucky few remaining are culled from, and the importance of luck in their success, is much larger than it appears.

Unfortunately betting on the longest shot every time takes a long time to win money reliably. There are so few wins that luck plays a major factor. Fortunately there is a way of betting on many long shots at once and have the chances of them winning be somewhat independant. In the case of horseracing, we know that in a given race exactly one horse must win, so if any horse loses that makes it more likely that each other horse wins.

To win more reliably, do as follows: In any given horse race, sort the horses by ascending likelihood of winning, according to the odds. Set a high water mark such that according to the odds the chances of any horse above that point winning is 1/2. Throw out all the horses above the mark, then place money on each of the horses you haven't thrown out in such a way that your payout will be the same if any of them wins. For example if one horse pays out 10:1 and another pays out 5:1 you put twice as much money on the 5:1 as on the 10:1.

Remember to place bets as late as possible, so that the odds drift as little as possible between the time when you place your bets and when the race actually runs. If everyone were to follow this strategy it would cause wild odds swings during the last few minutes, but fortunately this betting strategy is so incredibly lame that the chances of it becoming that popular are practically zero.

While I'm sure this approach does reliably much better than random betting, and its payout is relatively consistent over time, practical horseraces may keep enough of the pot for the house that the long term trend is negative. Unlike on wall street, there are very few hidden fees in sports betting, but the commisions for the house can be exorbitantly high.

Mozilla Planning

Brendan Eich, May 15, 1998

4. XXX-1998 (let's say it's 1-Oct-1998): Declare the trunk stable, tag it with a blessed CVS label that anyone can use to derive product from (particularly, but certainly not exclusively, Netscape for Communicator 5.0), and then:

5. XXX plus 1: open the tree to next-gen layout, aggressive modularity, other changes. I should emphasize that the difference between 1-Sept and this date cannot be many months, or we're doomed.

Mozilla 1.0 shipped June 5, 2002.

Brendan Eich April 5, 2004

If we do only "the browser thing" [...] we are likely to be irrelevant in approximately five years. Not unused, but trailing-edge; not built upon as a platform, not invested in even as a hedge.

Draw your own conclusions.

GMail

Apparently GMail uses a ton of javascript and frames, and feels more like a client side application and a web page. What pisses me off about this isn't anything Google has done, in fact quite the opposite. Massive amounts of client-side javascript was my vision for the future of web applications as a young punk back in '96. A good example is this demo, which has worked since Netscape Navigator 2.0 and the concurrent version of IE. Of course, we all know know what has become of that vision - basically nothing, since a combination of javascript buginess and general fear of doing real software has relegated javascript to mostly ornamental use on web pages.

I hope that my demo at least helped inspire someone working on GMail. My own youthful enthusiasm for doing javascript got squelched by beurocratic mismanagement in the company I worked for at the time. Earthweb, later to become one of the most grotesgue dot come IPOs and flameouts of them all. I hate seeing hustlers get rich.

It's good to at least see that my original approach had some merit. Just kidding about the last eight years of web development, kids. The approach of ambitious but naive hackers which we had from the very beginning was the right one.

Puzzle Extension

Wriggle puzzles have a fascinating property - It's possible to extend a puzzle to make another puzzle which is strictly harder, because any solution to the harder puzzle also works as a solution to the easier puzzle. This applies to worm extension, addition, and fusing. Sokoban has a similar property under the addition of new walls. It's interesting to find places in the standard sokoban levels where a wall could be added without making the puzzle impossible.

A very interesting poorly behaved game can be derived from this property - two players alternately extend a shared puzzle until one of them challenges the other to find a solution, and the other player either comes up with a solution and wins or fails to do so and loses. Interesting tactics include making a particular region of the puzzle only solveable in a tricky way which you already know, so there is a solution but it's computationally difficult to find.

My space shuttle, My death trap

This article made me completely blow my lid.

The independent group overseeing NASA's effort to resume shuttle flights said Friday the agency has completed the first steps toward returning to space. [...]

"The progress that has been made by NASA in implementing the recommendations of the Columbia Accident Investigation Board are substantial," said former shuttle commander Richard Covey, who is leading the task group.

While I'm glad that Covey is interested in following the recommendations of that well-written and well-researched report, perhaps it would have been of some benefit for him to have read it, or at least understood it when he read it.

The report says, in summary: The shuttle is not safe, cannot be made safe, and should only be used when absolutely necessary. Unsurprisingly, the report made after the Challenger failed said essentially the same thing.

The report also goes on at length about how NASA made ludicrous unsubstantiated claims as to the shuttle's safety, even after Challenger. And then, at the end of this article, without irony, we see a NASA official continuing the exact same claims.

"There is no doubt that this vehicle will be the most safe vehicle that has ever flown," Cuzzupoli said.

Whether out of arrogance or stupidity, Cuzzupoli is acting with a callous disregard for human life. Common sense indicates when the vehicle with the worst safety record in history claims to be the safest ever made it's completely full of shit. In this case, common sense is right.

Unsurprisingly, the entire NASA safety advisory panel quit last year, all on the same day. Perhaps they care a bit more about human life.

Manual Spam Filters

I noticed the other day that most of the viruses I get have a Message-Id tacked on by the local mailserver. A little bit of messing with procmail and suddenly my junk mail level is under control. I'm now spending a few minutes a day creating new rules throwing out junk mail my old filters missed. I'm sticking with only blocking dead giveaways instead of using complicated scoring rules, and thus far it's working extraordinarily well.

It's too bad that bounces have become effectively useless, since there are so many of them sent in response to forged mail. The only way to fix this would be for the local mail server to record all the Message-Ids it sends out, and only accept bounces which are in response to one of those. Given the quality of current email infrastructure, I'm not holding my breath.

It's amazing what a downer junk mail can be. I'm in much improved spirits now that I'm actually getting all my real mail instead of accidentally deleting it along with the junk.

blinky.at.or.at is sending me huge amounts of junk.

Ebay correction

The method of gaming Ebay I gave earlier isn't nearly as easy as I stated. The problem is that the first out-bid by minimum amount you do might actually win, because you can't tell if the current price is the current second place bid plus the minumum increment or the current first place bid exactly. The attack still works, but to do it you need to do multiple bids increasing by exactly the minumum increment each time, which is a much more flagrant.

Ebay is probably less worried about this than plain old mail fraud. One thing which helps is that auctions are frequently won by snipers, and these tricks don't work when they're bidding. I've taken to sniping everything I really care about, because it saves money.

Trust Metric Correction

The anti-spam trust metric I gave earlier can be easily defeated. A spammer certs a whole lot of fake identities, and from those fake identities, and from those fake identities cert just a few fake identities which they spam from. This will keep the negative certs from propogating back.

Here is a technique which fixes that. For each node, we figure out what the flow of spam looks like viewing that node as the source of everything (how to do this is explained below). If more than half of all such flows label a node as a spammer, then that node is assumed to be a spammer. It's generally impossible for a single node to not be labelled as a spammer in all cases, because when that node is is the source then it gets a very high spam rating, and when any of the nodes which certify it are viewed as the root it also gets a very high spam rating, although not quite so high.

To calculate the flow of spammerness into a single root, calculate flows between nodes along certification lines and though nodes based on the following criteria:

  • Spammerness flows backwards along certification lines, from the node certified to the one which did the certifying.
  • The inflow out outflow of each node sums to zero, except for nodes which have negative certs, which have excess outflow in the amount of their negative certs. The source node is another exception, it can have unlimited excess inflow.
  • The maximum flow through any node is set to be as small as possible. The tiebreak for multiple possible flows with this value the same is that the second greatest maximum flow is as small as possible, then third, etc.

For example, if node A is the source, and A certs B and C, and B and C both cert D, and D has 1 negative cert, then the spammerness ratings for A and D are both 1 and for B and C it's both 1/2.

A threshold is set above which a node is considered to be a spammer. Typically all nodes close to the source wind up looking like spammers and all actual spammer nodes look like spammers.

This technique should of course be combined with distinguishing 'confused' nodes, which are judged as being spammers but are certified by at least one non-spammer node. Confused nodes only get blocked from sending messages if they have directly spammed above the threshold. Without this, very highly connected nodes will sometimes get blocked simply because they're too highly connected, and hence wind being near the source node for all possible sources.

Calculating this exactly might take a while, but there are decent polynomial time approximation algorithms. I'm not sure how small it's possible to get the exponent, but I'm following the policy of designing behavior first, and worrying about efficiency second.

Stamps Against Spam

Ken Birdwell, a coworker of mine, has an interesting idea about how to use stamps against spam. The basic premise of stamps is that a stamp represents some kind of resource, typically a penny, which a sender must have put out in order to get mail accepted. The question is, if the mail wasn't spam, does the penny go back to the sender, get transferred to the receiver, or simply drop into the ether? Ken's observation is that most people send and receive about the same amount of mail, so you could simply have the stamp transfer ownership, without ever turning it back into a penny. This should keep things about at equilibria, and designing a protocol for it is trivial - a client sends a stamp to a server, and the server responds with either a reject or an accept with a new valid stamp.

A few modifications to this scheme can help maintain equilibrium. First, since a lot of mail is sent to dead accounts or at least accounts which don't use this scheme, a sender should try to collect stamps they issued after a month or so to keep them from disappearing into the ether. Second, peers which have an excess of stamps should start probabilistically not caching in (or caching in and sending back) stamps they receive.

Sourceforge

I pushed out a new BitTorrent release, but it's seriously borked because of sourceforge. Thanks guys. This is the third release in a row which has gotten garbled - only the last two were nicely garbled, this one is flat-out giving 404 most of the time.

Codecon

Codecon is coming up this weekend. It's only $95 for three whole days of presentations, everybody should come!

Codeville

Speaking of which, Codeville, a project by me and my brother, is much more mature than the last time I mentioned it (check the page for details) and will be presented on friday.

The program for CodeCon 2004 has been announced.

http://www.codecon.org/2004/program.html

CodeCon is the premier showcase of active hacker projects. It is a workshop for developers of real-world applications with working code and active development projects. All presentations will given by one of the active developers, and accompanied by a functional demo.

Highlights of CodeCon 2004 include:

  • PGP Universal - Automatic, transparent email encryption with zero clicks
  • Osiris - A free Host Integrity Monitor designed for large scale server deployments that require auditable security
  • Tor - Second-generation Onion Routing: a TCP-based anonymizing overlay network
  • Vesta - An advanced software configuration management system that handles both versioning source files and building
  • PETmail - Permission-based anti-spam replacement for SMTP
  • FunFS - Fast User Network File System - An advanced network file system designed as a successor for NFS
  • Codeville - Distributed version control system
  • Audacity - A cross-platform multi-track audio editor

The third annual CodeCon takes place February 20 - 22, noon - 6pm, at Club NV (525 Howard Street) in San Francisco. CodeCon registration is $95; a $20 discount is available for attendees who register online prior to February 1, 2004.

http://www.codecon.org/2004/registration.html

100 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!