mlton has been bootstrapped and does now run on NetBSD too. Unfortunately it is IA32 only. Primary work done by me. In case you might wonder, MLton is a very fast compiler for the language Standard ML.
The next thing I will work on is a pkg for the NetBSD pkgsrc tree, so it will be easy to install. Bootstrapping a SML compiler written in SML poses a little chicken-and-egg problem, so you might have to download a binary version of the compiler in order to build itself.
MLton does not have a proper type-inference pass with error messaging. It relies on other compilers/interpreters for this. I intend to look a bit into this to see if it is that hard to add to the compiler. I think it is easy, but the problem is getting the code acceptable on a larger scale and hopefully doing it the right way the first time.
MLton has a neat trick which I think more Open-Source projects should pick up: Do some work for them and get a t-shirt for your efforts. It is a rather simple thing, but I think the idea is brilliant.
Now raph has opened the can of worms: Judy is interesting, but does also pose problems. The algorithm is complex and lacks formalism. My bet is that Judy is hard to get to work correctly in all corner cases. The idea of exploiting the cache for higher speed is neat though and parts of the code/ideas could probably benefit other projects.
I've decided to run the following algorithms against each other: GPerf, a trie structure, a PATRICIA tree (see Knuth TAOCP vol 3 for this one), Judy, and a DJB hash (Somewhere on DJB's page). The task will be pretty narrow: Given a list of C keywords look them up in a large project and report the number of keywords and identifiers. ''Large project'' could for instance be the linux kernel, a *BSD /usr/src or similiar. I will have a control program which does just read the data and throw them away for comparison reasons.
The result of the above should show if the measurements in the 13 year old GPerf paper still holds. If they do, I will recommend that people uses a trie instead of a perfect hash, since they are faster and easier to understand. They use more memory though so they might now be adequate for large-number-of-keywords problems.