22 May 2001 gstein   » (Master)

Shelved the rcs parser for a while. I cranked out as much performance as it can possibly get (without making the code look *really* horrible). Overall, it is somewhere between 10 and 12 times faster than when I started. For small RCS files, it is comparable for forking off rlog and parsing the result. For large files, though, rlog/parse is faster. I think the next step is to use something like mxTextTools or to use a custom RCS file tokenizer. The internal architecture is set up using a "token stream" plus the parser. That should make it easy to swap in different stream implementations.

I tried using mmap, but it was no faster than just reading the darned thing into memory (in 100k chunks). It is simply that the algorithm is not I/O bound, so using mmap to optimize the I/O doesn't help at all.

Over the weekend, I've been working on revamping Subversion's build system. We currently use automake. It is a total dog and some parts of automake are actually a bit hard to deal with. I've tossed out automake and recursive makes, with just a single top-level makefile. The inputs to the makefile are generated by a Python script. Net result is that ./configure will produce a Makefile from Makefile.in, and then the build-outputs.mk will be included by that. build-outputs.mk is generated by the Python script when we create the distribution tarballs (so end users don't need Python just to build; this is similar to how automake uses Perl, but the outputs are portable).

The resulting build process is much faster. ./configure is also going to be speedy since we only need to process one Makefile.in. In addition, automake creates a billion "sed" replacements within configure, then applies all of those to all the files. We'll be reducing the replacements to just a couple dozen. With the reduced file count, it should scream. We also don't have automake's time consuming process (producing Makefile.in from Makefile.am); my Python script executes in just 2 seconds of wall clock time. That includes examining all the directories to find .c files to include into the build.

I've got make all, install, and clean working. I still need to do distclean, extraclean, debug the "make check" target, and then do dependency generation. On the latter, the Python script will just open the files and look for #includes. This will be much more portable than automake's reliance on gcc-specific features. Oh, and we also get rid of automake's reliance on gmake.

Nice all around...

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!