Been working on optimizing the RCS file parsing module (within the ViewCVS package). Having Python fork/exec with a pipe to "rlog" is still a lot faster than having Python directly parse the file. But it is getting closer. I'm now going to try memory-mapping the file and parse tokens that way. Could be much faster.
I want this to be really fast because it would be nice to use manual parsing rather than rlog output since there is a small amount of data loss. In particular: it is hard to reconstruct the actual RCS revision tree from just the rlog output. (hmm; maybe "hard" rather than "impossible")
The second reason is that this module will be used by Subversion's cvs2svn tool. To convert SourceForge's 49 gigabytes of CVS repository, I want this to be as fast as possible :-)