I've been having lots of fun lately with Gensim, a Python framework for vector space modelling. It includes fun stuff like latent semantic analysis, latent dirichlet allocation and other goodies. Allied with NLTK, this makes a very formidable Python- based NLP framework.
My tasks are sorting newsgroup posts into correct groups and I've achieved a reasonable level of accuracy (0.92) which isn't bad given that it's entirely dependent upon content. However, most analyses are showing lower accuracies (0.70+) which isn't bad but not far away enough from chance performance to be taken realistically. However, there are a few ways to improve this and I'm conducting an enormous number of experiments to get an effective mental model of how vector space models work.
This is all the beginning of constructing a relevance engine which I'm sure will be useful to some people.