# Older blog entries for Bram (starting at number 18)

26 Aug 2002 (updated 26 Aug 2002 at 20:13 UTC) »
Plugging a Hole

The proof of the non-computability of the halting of turing machines presented in most undergraduate programs contains a huge gaping hole. Fortunately it's not hard to fix.

The Broken Proof

Let us say that we have a turing machine which accepts an encoding of another turing machine as input and always prints out a 0 before its last move if the input halts or a 1 before its last move if the input doesn't halt, and never spends an infinite amount of time thinking about it.

We can then construct a turinng machine which calculates what whether its input would halt, and does the opposite. If we give that machine as input to itself, then we have a contradiction.

The problem here is that there's a distinction between machines which take input and machines which don't take input. The machine which does the opposite takes input, when we fed it to itself the emulated one is given no input, hence it's operating on something else entirely, and there is no contradiction.

The Fixed Proof

Let us say that we have a turing machine which accepts an encoding of another turing machine as input and always prints out a 0 before its last move if the input halts or a 1 before its last move if the input doesn't halt, and never spends an infinite amount of time thinking about it.

We can then construct a turing machine which spits out an encoding of itself, calculates whether that machine will halt, and then does the opposite. The spitting out is based on the standard quine trick - the program contains a blueprint of itself, with a special notation where the blueprint goes saying 'blueprint goes here', it then follows the blueprint exactly and spits out a copy of the blueprint where the notation goes.

Why there is such an obvious and easy to fix error as part of standard basic CS curriculum is beyond me.

No Mix and Matching

While I'm complaining about common errors, I'd like to point out something about log(n) multipliers in runtimes. There are basically two ways of calculating the runtime of an algorithm - the number gates required to construct it using a directed circuit, and the number of operations it requires on a realistic machine with parallel memory.

If you calculate runtime in the first way, you're perfectly justified in saying that adding two numbers on the scale of n takes log(n) time, but since many algorithms undergo a quadratic blow-up in size in this model due to the lack of parallel memory, it isn't applicable to many problems.

If you calculate runtime in the second way, you have no justification for saying that adding two numbers on the scale of n takes log(n) time unless you also take into account that simple memory retrieval technically takes log(n) time, which nobody does.

When calculating runtime, don't mix and match models - if you allow for constant-time memory retrieval, allow for constant-time addition of small numbers.

24 Aug 2002 (updated 26 Aug 2002 at 17:57 UTC) »
wardv: I used to play corewars all the time, in an integrated development environment on the Commodore 64 which my dad wrote in his spare time. Notably, he was a journalist and not a programmer back then.

There was one tournament in which I entered a top contender, Powerbomb, which won almost all of its battles before the finals. Unfortunately, Only five programs were put into the final round, including all the ones which Powerbomb didn't do all that well against. Ah well, I should probably be happy just to have gotten two programs into the finals. I was only twelve at the time.

Mathematical Foundations

I think this post claims an axiomatic system has been discovered which essentially declares its own consistency but manages to avoid the cheap trick of diagonalization.

Update: I'm told I read that all wrong. Ah well.

Intermediate Automata

I believe some of these papers demonstrate that there are (very artificial) cellular automata which exhibit nontrivial behavior but aren't universal, contrary to Wolfram's thesis. I'm increasingly coming to believe that some of the simple rules, especially 18 and 22, are of intermediate degree.

Second Best

I figured out a coherent strategy for Second Best - all players agree on what the two most common words will be, and each player picks one of them at random.

Tweaking Spam

I've been thinking about the spam filtering code I gave earlier. It can be improved a lot.

For starters, multiplying the number of nonspam appearances by two is kind of a hack. It's almost exactly equivalent to subtracting .7 from each token's value. A much more robust approach is to increase the spam threshold from 2.6 to 5.

Also, max value of 4.6 (equivalent to .99 bayesian) seems like a bit much, since just one or two spam words could easily label an otherwise neutral message as spam. Reducing it to 3 (bayesian about .95) seems much more reasonable. Someone whose entry has scrolled off advogato recentlog mentioned that a single token, '2002' threw off his filtering quite a bit.

Several other subtle improvements and code cleanups are possible. I've included them all in the following code -

```from math import log
from re import findall

class spamtrap:
def __init__(self):
self.good = {}

for t in _maketokens(message):

for t in _maketokens(message):
self.good[t] = self.good.get(t, 0) + 1

def is_spam(self, message):
ss = []
for token in _maketokens(message):
ss.append(min(3, max(-3, log(self.bad.get(token, 0) + 1) -
log(self.good.get(token, 0) + 1))))
sum = 0
if len(ss) > 16:
ss.sort()
for v in ss[:8] + ss[-8:]:
sum += v
else:
for v in ss:
sum += v
return sum > 5

def _maketokens(message):
ts = {}
for t in findall("[a-zA-Z0-9'\$]+", message):
ts[t.lower()] = 1
return ts.keys()
```

Sorry again about whitespace mangling which interferes with cut'n'paste - it's advogato's doing.

Hot Potato

In the game Hot Potato, each player starts with 100 points. One unlucky player, chosen at random, is initially given the hot potato. On each turn, the player with the hot potato decides which player gets it next, and the recepient's score is decremented by one. The hot potato may not be passed to a player with a score of zero. The winner is the last player with a positive score.

I cannot fathom any coherent strategy for Hot Potato. How to approach it formally is a complete mystery to me.

Second Best

In the game Second Best, each player writes down a word and then all words are revealed. All players who wrote down the second most commonly occuring word are winners, everybody else are losers. In the case of a tie for second, everybody loses.

I can't make heads or tails of the strategy of Second Best either.

20 Aug 2002 (updated 20 Aug 2002 at 22:30 UTC) »
fxn: Using a persistent connection won't help with throughput much unless you implement pipelining.
barryb: Those numbers were derived by plugging in Raph's formula -

• log(.01) - log(1 - .01) = -4.6
• log(.99) - log(1 - .99) = 4.6
• log(.2) - log(1 - .2) = -1.4
• log(.9) - log(1 - .9) = 2.2

The formulas for probabilities were likewise converted -

log(a / (a + b)) - log(1 - (a / (a + b))) = log(a) - log(b)

Note how much simpler the formula on the right is. I think it's best to think of this heuristic as based on scores, with bayesian analysis being an intuitive motivation, rather than a rigorous basis.

ifile is a noteworthy related project.

Donations

Since there's been discussion on advogato diaries lately about which projects are worth donatiing to, I'd like to point out that I'm accepting donations for my work on BitTorrent, a project I've been working on full-time for over a year, with a mature, deployed piece of code to show for it.

Retroactive Spam

I realized that tiebreak behavior was unspecified in the spam filtering code I gave yesterday. The way I implemented it happened to work out that if there were 15 max probability spam and 15 max probability not spam, it assumed spam. A much more robust approach is, rather than summing up the fifteen scores whose absolute values are the highest, sum up the eight largest positive and eight most negative ones. I've now retroactively changed by previous diary entry to work that way. You can do that in a weblog.

17 Aug 2002 (updated 19 Aug 2002 at 02:09 UTC) »
Translating into English

Paul Graham has some interesting ideas about how to filter spam. Unfortunately he gives code samples in some weird language, so I've translated into Python and fleshed them out a bit.

This code ignores duplicates of a single token in a message, since they resulted in some rather ugly hacks. It also implements Raph's suggestion for using scores. Also, rather than summing the fifteen scores whose absolute values are highest, it sums the eight largest positive and eight most negative, which is much more robust behavior.

This code has no support for serialization and doesn't strip binary attachments, but other than that should work well.

```from math import log
from re import findall

class spamtrap:
def __init__(self):
self.good = {}
self.scores = {}

def _recompute(self, token):
g = 2 * self.good.get(token, 0)
if g + b >= 5:
self.scores[token] = min(4.6, max(-4.6, log(b + .00001) - log(g + .00001)))

for t in _maketokens(message):
self._recompute(t)

for t in _maketokens(message):
self.good.setdefault(t, 0)
self.good[t] += 1
self._recompute(t)

def is_spam(self, message):
ss = [self.scores.get(t, -1.4) for t in _maketokens(message)]
ssp = [i for i in ss if i > 0]
ssn = [i for i in ss if i < 0]
ssp.sort()
ssn.sort()
sum = 0
for v in ssp[-8:] + ssn[:8]:
sum += v
return sum > 2.2

def _maketokens(message):
ts = {}
for t in findall("[a-zA-Z0-9'\$]+", message):
ts[t.lower()] = 1
return ts.keys()
```

Sorry if there are indentation problems on cut and paste - advogato inserts <p> tags even between <pre> tags.

lukeg: Going random when you've got a statistically insurmountable lead might help if you're optimizing for match wins over total score, but in practice it's very rare for an opponent you were trouncing to suddenly snap out of it and start beating you.

Your champion would be soundly trounced by the more sophisticated strategies I described. Since even sword and shield is fairly simple to implement and plenty effecient enough to run in idel, just a bit of coding could bring your Roshambo competition from amateur to world-class.

thraxil: Roshambo and Prisoner's Dilemma have distinctly different flavors because Roshambo is a zero-sum game, in which no cooperative strategies are possible. Tit-for-tat is now getting practical application in the choking algorithms for my project BitTorrent, which are very analagous.

isenguard: That paper is interesting, but I'm very leery of empirical measurements of constraint satisfaction heuristics which don't set the actual record. Experimenting with 10 when the record is 21 is pretty bad (it's even more now). There is only one fair measure of time, and that's minutes and seconds.

Knight Moves

If we mark the minimum number of moves it takes for a knight to get from the origin to each square on an infinite chessboard, we find there are four local maxima - exactly two squares away from the origin along each of the four diagonals. Generalizing to knights which move x in one direction then y at a right angle, with x and y relatively prime (the standard knight has x = 2 and y = 1) it appears that there are always four local maxima, and they're always on the diagonals. I don't know why this is.

Roshambo

Today I'll be writing about computer strategies for the classic game Roshambo, also known as paper scissors stone.

But ... but ... There's no strategy in Roshambo!

While it is true that playing random can reliably do dead average, in any tournament everyone will be trying to win, hence everyone will have bias, and strategies which do better than average against a wide range of opponents will rise to the top.

I've spent far more time than is reasonable reading about the computer Roshambo tournaments, and will present a distilled explanation of the strategies here.

Henny

Henny exemplifies the simplest strategy which works well in practice. It picks a random move which the opponent played previously, and plays the response which beats it. If the opponent has any bias toward playing the same move over time, Henny will win. Henny is also very defensive - optimal play against it will only get an edge very slowly over time, and even that might get swallowed by noise.

Markov Chains

Another winning in practice strategy is to use markov chains - look at the last n moves your opponent made, and predict that they'll continue playing as they have most frequently in the past. Considerable variation is possible in length of chains, which brings us to...

Iocaine Powder

The competitor Iocaine Powder magically combines the best qualities of Henny and markov chaining by looking for the longest match between the moves your opponent just played and any sequence they've played in the past, and assuming they'll play the same next move again. If the opponent has a bias based on the last n moves for an n, this strategy will essentially pick a random time they played the same last n moves they just did, which is the defensive strategy used by Henny.

Ironically, this algorithm is completely deterministic.

Note that it's slightly better when there are equal length matches to go with the first matching sequence rather than the last matching sequence, to perform well when an opponent plays a string of paper, then a scissors, then a string of paper again.

It's possible to implement this technique very efficiently using trees, which strangely Iocaine Powder doesn't do (MemBot does, though).

But this strategy still has a major weakness, and Iocaine Powder has another trick up its sleeve...

Sicilian Reasoning

A braindead, fixed sequence entry named DeBruijnize nearly cleaned up in the first tournament. DeBruijnize sequences are biased against sequences they've done in the past, rather than in favor of them. A meta-strategy called sicilian reasoning beats it nicely. Compute how well you would have done to date predicting the opponents move straightforwardly, then always shifting it up one (paper->scissors, scissors->rock, rock->paper), then down one, then the same three predicting instead your own side. Play whichever one would have the highest score so far.

Sicilian reasoning cleanly defeats not only DeBruijnize, but all manner of cheap variants on it.

Sword and Shield

This is my own idea, which is untested, but seems reasonable.

Generate predictions via whatever method (pre-sicilian reasoning) for both your move and the opponent's move, then play a move based on the following table. The top row is your predicted move, the left row is the opponent's (t = stone) -

```        p  s  t
---------
|
p |  s  p  s
|
s |  t  t  s
|
t |  t  p  p
```

This algorithm plays defensively by predicting both sides simultaneously, and hence may beat any opponent which bases each specific move on a prediction for only one side. It has nine sicilian reasoning variants, found by offseting the prediction for either side one of the three possible ways.

Wimping Out

A tempting strategy is to make your bot 'wimp out' and start playing randomly if it isn't doing well. Tournaments play two programs against each other many times with no persistent information between runs to keep this strategy from being effective.

Optimizing for Score

The winning entry, Greenberg, looks at historical performance if its history buffer was various sizes, and has exponential falloff of their performance. This makes it score very well against weaker opponents, but makes it more vulnerable to better opponents. Score, match wins, and worst-case lossage are all subtly different things to optimize for.

The Future

Sadly, there appear to be no plans for future computer Roshambo tournaments. It would be interesting to see how strategies continue to evolve.

12 Aug 2002 (updated 12 Aug 2002 at 21:29 UTC) »
Formal Methods

The utility of formal methods for practical code writing is an interenting debate. As you might expect, I have some thoughts on the matter.

Code winds up much better if it is rewritten pragrmatically with experience. I think there isn't a single line of BitTorrent which hasn't been rewritten at least twice. If I had been using formal methods, the whole program would essentially have been proven once for each rewrite. Much easier is to write the whole program once using more direct methods and then audit it.

The much-vaunted slow programming of the space shuttle is mostly a triumph of beurocracy. Switching from a B&D to a write-then-audit methodology would drastically reduce costs and result in better code.

While I'm still not convinced of the practical utility of formal methods, I'm completely sold on unit tests. A unit test is a test which exercises some code without requiring the rest of the program be linked in. The approach to writing them is to make them all pass/fail, and have a script which runs all unit tests and reports failures. Whenever a unit test is failing, fixing it becomes the top priority. Anyone whe doesn't fix their unit test breakage with the excuse that getting all the tests to pass is impractical anyway should be fired, (or at least, have their commit access revoked).

While formal methods clearly slow down development, unit testing actually speeds it up by reducing debugging time, making it not only practical but easily justifiable.

Writing good unit tests is more art than science. The general idea is to try to make the unit test code simpler than the code it's testing, and also to make the test code assert statements from a fundamentally different perspective than the code itself. Formal methods, by contrast, wind up repeating the code a lot, and thus aren't good at catching specification bugs.

After two years or so of writing unit tests for most of my code, I find the thought of debugging something without them absolutely horrifying, and can't bring myself to even toy with newly written code without writing some tests first (I generally don't test bootstrap and UI code though, since those aren't straightforwardly encapsulatable).

The above approach to unit tests is taken from extreme programming. However, extreme programming people seem to think that unit == class, which is wrong, and their encapsulated APIs seem to not be nearly as hermetically sealed as mine. Units can span several classes, and if they do those classes should internally have complete access to each others's data, while the outside world should only have access to data and methods if it is explicitly granted.

The bugs I've run across in BitTorrent generally fall into the following categories, each of which occurs about an order of magnitude less often than the one before -

1. Bugs found in proofreading. My skill at this depends a lot on how tired I am.

2. Bugs found writing unit tests. This is what really causes my debug time to be so small.

3. Bugs which manage to slip past unit tests but cause a stack trace when invoked and are fixed within a few minutes of being found.

4. Bugs which involve subtle misunderstandings of third-party APIs. There are far too many of these, since basically all third-party APIs are written horrendously (wx, select(), open()), but I generally get them fixed eventually, once I understand what the actual semantics are.

5. There was one (count 'em, one) bug which managed to slip past all the testing and remain in the program for months, resulting in disastrous performance. The wrong integer was getting passed in by bootstrap code, and the generally robust nature of the underlying APIs caused the problem to be manifested as extremely poor performance rather than an outright bug. Eventually I observed some behavior which unambiguously indicated a bug, and managed to fix the problem after a few hours (and a lot of whining on my part).

Notably, this problem could just as easily have slipped past formal methods, but would have been caught very easily had I been religious about passing parameters by name rather than position.

(The specific bug, in case you're curious, was that the number of pieces to pipeline from any one peer was set at 32768 instead of 5, often resulting in the entire file being requested from one peer. Since BitTorrent usually has a policy of not requesting the same piece from more than one peer at once, there were other connections which were going unutilized since they had nothing else to request. This problem would have been even furthur masked had endgame mode been written at the time. Robust APIs can eliminate the possibility of a huge number of bugs, but the bugs which remain can sometimes be very strongly masked.)

Gollum Gollum

Someone posted an advogato diary entry (I'm not sure who, it's scrolled off recentlog) linking to a paper on exhaustive search for golomb rulers. Unfortunately, the site was down and archive.org didn't have the paper, I'd like to read it.

Conjecture Proven

Robert M. Solovay sent me a proof of my conjecture about prescient busy beaver numbers. Here it is, (slightly rearranged) -

I can prove your conjecture [and much more]. For example, there is a d such that

pbb(x) > bb(bb(x)) for all x > d.

This is stronger since bb(x) > h(x) for x large for *any* recursive function h.

All Turing machines considered in this letter have one two-way infinite tape and two symbols, 1 and "blank".

Let me state a series of results and then prove them:

Definition: K is the set of Godel numbers of TM's that started on a blank tape eventually halt. (A Godel number is a natural number encoding of a turing machine).

bbk is the Busy Beaver function for oracle TM's that use the oracle K.

It follows that for any function h which is recursive in K, we have bbk(n) >= h(n) for all but finitely many n.

In particular, bbk(n) >= bb(bb(2^n)) for all but finitely many n.

Assertion: There is a positive constant C such that:

1. bbk(Cn * log n) >= pbb(n) for all but finitely many n.

2. pbb(Cn * log n) >= bbk(n) for all but finitely many n.

This is the assertion whose proof I will sketch. But a little more work could improve this by replacing Cn * log n by just Cn.

Let's start with the proof of 2). Let M be a TM with at most n states [which uses an oracle for K] that halts in precisely bbk(n) states. We will design an ordinary TM, N, with O (n log n) states that visits state 1 a finite number of times but more than bbk(n) times.

It clearly takes O(n log n) bits to specify the table of the TM M.

N will first use O(n log n) states to write this description of M on its tape.

Comments: Using a trick of Chaitin, one could get by with just O(n) states here. It is at this point that the O(n log n) comes in. The remainder of the proof uses only O(1) states.

Officially, our TM has only two symbols and one tape. But it can simulate a k-tape n symbol tape with r states with Cr states of its own. [The C depends on k and n.] I find this remark useful in thinking through the details of programming N. I will, however, suppress these details in this sketch.

N will construct a series of approximations to the set K. The first approximation is that K is the empty set. It then simulates M using this approximation to K till one of two things happens:

(a) the simulation halts;

(b) for some n that the simulation queried the oracle about, N discovers that the original oracle gave the wrong answer.

If (a) happens, the program continues to look for an event of type (b). If the program ever finds an event of type (b), it redoes the simulation of M's action using the newly updated version of K.

Eventually, the program will find the correct computation of the M program from the true K.

We arrange to enter state 1 for any step in any of the simulated programs.

Because we eventually converge on the true computation of M the state 1 is only entered finitely many times. Because we eventually find the true computation, state 1 is entered more than bbk(n) times.

The proof of claim 1 is similar. We let now M be a TM of at most n states that enters state 1 precisely pbb(n) times.

Here is an oracle TM, N, [using an oracle for K] with O(n log n) states and that eventually halts but runs at least pbb(n) steps before halting.

N first uses O(n log n) states to write down a description of M.

It now simulates M. But before doing any step of M it asks "Will M ever again enter state 1". [It can settle this question using its oracle for the halting problem.] If N ever gets the answer NO to this question it immediately halts.

What I find interesting about this proof isn't so much that it shows that bb grows less than pbb, which seemed fairly obvious, but that it shows that pbb and bbk grow at essentially the same rate, which negated my intuition that pbb grows much more slowly than bbk. I always find unexpected results more interesting than expected ones.

9 older entries...