So there was one link which I didn't include in the Bayesian round-up in this week's NTK, but might be interesting to folk here:
The Bow Toolkit is a "toolkit for statistical language modeling, text retrieval, classification and clustering". It came recommended by the OpenCola folk. I didn't have enough time (or expertise, really) to have a proper look at it, but it looks pretty nice.
