Lots happening: I've been building a semantic relevance engine - something that can accurately determine the semantic similarity of 2 text documents and it's working reasonably well. Working completely untrained, I'm getting accuracies of well above 0.8 and often above 0.9. Obviously 1.0 is the ideal but even human judgements rarely get above 0.9 with the corpora I've been using for this.
The good thing is that I appear to be discovering new stuff almost every day about how documents are understood. There are some approaches I've used that I've not read about in the literature so there might be some useful stuff for the world here.
However my aim is to make a web service around this. And it's all based on open source software (Python, numpy, Scipy, Gensim etc) which is perfect. There is proprietary knowledge used, however: the corpora, how it's prepared and the architecture of the engine; but that will all come publicly out soon enough.