14 May 2001 gstein   » (Master)

Been working on optimizing the RCS file parsing module (within the ViewCVS package). Having Python fork/exec with a pipe to "rlog" is still a lot faster than having Python directly parse the file. But it is getting closer. I'm now going to try memory-mapping the file and parse tokens that way. Could be much faster.

I want this to be really fast because it would be nice to use manual parsing rather than rlog output since there is a small amount of data loss. In particular: it is hard to reconstruct the actual RCS revision tree from just the rlog output. (hmm; maybe "hard" rather than "impossible")

The second reason is that this module will be used by Subversion's cvs2svn tool. To convert SourceForge's 49 gigabytes of CVS repository, I want this to be as fast as possible :-)

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!