24 Aug 2002 Bram   » (Master)

wardv: I used to play corewars all the time, in an integrated development environment on the Commodore 64 which my dad wrote in his spare time. Notably, he was a journalist and not a programmer back then.

There was one tournament in which I entered a top contender, Powerbomb, which won almost all of its battles before the finals. Unfortunately, Only five programs were put into the final round, including all the ones which Powerbomb didn't do all that well against. Ah well, I should probably be happy just to have gotten two programs into the finals. I was only twelve at the time.

Mathematical Foundations

I think this post claims an axiomatic system has been discovered which essentially declares its own consistency but manages to avoid the cheap trick of diagonalization.

Update: I'm told I read that all wrong. Ah well.

Intermediate Automata

I believe some of these papers demonstrate that there are (very artificial) cellular automata which exhibit nontrivial behavior but aren't universal, contrary to Wolfram's thesis. I'm increasingly coming to believe that some of the simple rules, especially 18 and 22, are of intermediate degree.

Second Best

I figured out a coherent strategy for Second Best - all players agree on what the two most common words will be, and each player picks one of them at random.

Tweaking Spam

I've been thinking about the spam filtering code I gave earlier. It can be improved a lot.

For starters, multiplying the number of nonspam appearances by two is kind of a hack. It's almost exactly equivalent to subtracting .7 from each token's value. A much more robust approach is to increase the spam threshold from 2.6 to 5.

Also, max value of 4.6 (equivalent to .99 bayesian) seems like a bit much, since just one or two spam words could easily label an otherwise neutral message as spam. Reducing it to 3 (bayesian about .95) seems much more reasonable. Someone whose entry has scrolled off advogato recentlog mentioned that a single token, '2002' threw off his filtering quite a bit.

Several other subtle improvements and code cleanups are possible. I've included them all in the following code -

from math import log
from re import findall

class spamtrap: def __init__(self): self.good = {} self.bad = {}

def add_spam(self, message): for t in _maketokens(message): self.bad[t] = self.bad.get(t, 0) + 1

def add_nonspam(self, message): for t in _maketokens(message): self.good[t] = self.good.get(t, 0) + 1 def is_spam(self, message): ss = [] for token in _maketokens(message): ss.append(min(3, max(-3, log(self.bad.get(token, 0) + 1) - log(self.good.get(token, 0) + 1)))) sum = 0 if len(ss) > 16: ss.sort() for v in ss[:8] + ss[-8:]: sum += v else: for v in ss: sum += v return sum > 5

def _maketokens(message): ts = {} for t in findall("[a-zA-Z0-9'$]+", message): ts[t.lower()] = 1 return ts.keys()

Sorry again about whitespace mangling which interferes with cut'n'paste - it's advogato's doing.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!