Older blog entries for gstein (starting at number 94)

Completed and installed the new build system for Subversion. It is quite sweet. Fast, simple, and flexible. Sure beats automake for our particular use-scenario.

I could see that automake is good for FSF/GNU projects where they have a particular set of requirements, but it isn't very flexible when you have different policies. It also creates some performance issues (as I mentioned in my previous diary entry).

Shelved the rcs parser for a while. I cranked out as much performance as it can possibly get (without making the code look *really* horrible). Overall, it is somewhere between 10 and 12 times faster than when I started. For small RCS files, it is comparable for forking off rlog and parsing the result. For large files, though, rlog/parse is faster. I think the next step is to use something like mxTextTools or to use a custom RCS file tokenizer. The internal architecture is set up using a "token stream" plus the parser. That should make it easy to swap in different stream implementations.

I tried using mmap, but it was no faster than just reading the darned thing into memory (in 100k chunks). It is simply that the algorithm is not I/O bound, so using mmap to optimize the I/O doesn't help at all.

Over the weekend, I've been working on revamping Subversion's build system. We currently use automake. It is a total dog and some parts of automake are actually a bit hard to deal with. I've tossed out automake and recursive makes, with just a single top-level makefile. The inputs to the makefile are generated by a Python script. Net result is that ./configure will produce a Makefile from Makefile.in, and then the build-outputs.mk will be included by that. build-outputs.mk is generated by the Python script when we create the distribution tarballs (so end users don't need Python just to build; this is similar to how automake uses Perl, but the outputs are portable).

The resulting build process is much faster. ./configure is also going to be speedy since we only need to process one Makefile.in. In addition, automake creates a billion "sed" replacements within configure, then applies all of those to all the files. We'll be reducing the replacements to just a couple dozen. With the reduced file count, it should scream. We also don't have automake's time consuming process (producing Makefile.in from Makefile.am); my Python script executes in just 2 seconds of wall clock time. That includes examining all the directories to find .c files to include into the build.

I've got make all, install, and clean working. I still need to do distclean, extraclean, debug the "make check" target, and then do dependency generation. On the latter, the Python script will just open the files and look for #includes. This will be much more portable than automake's reliance on gcc-specific features. Oh, and we also get rid of automake's reliance on gmake.

Nice all around...

Been working on optimizing the RCS file parsing module (within the ViewCVS package). Having Python fork/exec with a pipe to "rlog" is still a lot faster than having Python directly parse the file. But it is getting closer. I'm now going to try memory-mapping the file and parse tokens that way. Could be much faster.

I want this to be really fast because it would be nice to use manual parsing rather than rlog output since there is a small amount of data loss. In particular: it is hard to reconstruct the actual RCS revision tree from just the rlog output. (hmm; maybe "hard" rather than "impossible")

The second reason is that this module will be used by Subversion's cvs2svn tool. To convert SourceForge's 49 gigabytes of CVS repository, I want this to be as fast as possible :-)

12 May 2001 (updated 12 May 2001 at 00:40 UTC) »

Hrm. Mid-February since my last diary entry. Zoiks!

Lessee... Subversion is at Milestone 2 now, meaning that I got all the WebDAV/DeltaV stuff in there to do full checkouts, updates, and commits over the network. (and imports of new code) Milestone 3 is defined as "self hosting" -- we'll be switching our own repository over to SVN and "eat our own dogfood", so to speak. My big tasks on that are the cvs2svn converter, and some Python bindings to one of the Subversion libraries.

Elsewhere, my edna and ViewCVS projects have moved to SourceForge. In fact, ViewCVS is now used on SourceForge itself! With SVN M2 in the can, I'm going to try and get a ViewCVS release out, then do some Apache and APR work.

Travel-wise, I've been to Seattle a couple times in the past few months, down to Los Angeles for the Python Conference, to Vegas for "boys weekend", and to NYC for a wedding.

Also, I got engaged back on Valentine's day. Anni and I have been together for nearly 13 years, so this isn't a "huge" step, but big nonetheless :-) We've got most of the stuff planned already and are in the home stretch.

On the PS2 front, I picked up Onimusha: Warlords last week. Way awesome graphics on that game! I've gotten through the whole game, I believe, except for the last boss. If he isn't the last, then I'm not sure what would be next. Quite an excellent game. Others that I would recommend are: Smuggler's Run, DOA2: Hardcore, Timesplitters, SSX, and Midnight Club Street Racing.

I think that will be it for now.

The other day, I implemented activities and working resources (and, therefore, checkout) in mod_dav_svn. As a prereq, I revamped mod_dav's CHECKOUT handling. It was stuck back in the old draft days. A number of mod_dav's method handlers for DeltaV are rather old and need to be updated w.r.t the latest draft.

Main point: we now have some server response for some of the SVN actions. Still no backend repository to test against, which severely limits how far I can go.

Next up, I think that I'm going to work on the auxilliary info bits: OPTIONS responses and some live properties. At various points, the client will be requesting this data, and I'll need to get it generated. Much of it, I should be able to do without the FS. For example: baseline resources, activity collection sets, and the checked-in property.

It might be possible to quick simulate a repository that I can fetch. Not sure on that one. The other guys are working on the repository, so it is possible that it will arrive before I get completely blocked. There is simply a lot of little details that I can occupy myself with meantime. [heck, there is some error handling and reporting within mod_dav that needs some work (to deal with the new 403/409 response bodies)]

In Apache land, we finally reached some semblance of agreement (or apathy?) and got some brigade buffering stuff checked in. There is one bit left where I need to post an alternate patch, but it is pretty much there. Unrelated, I went through and cleaned up some really old crud; I was looking further (specifically, at CORE_PRIVATE) usage. Feh. That stuff is hardly private. I'm tempted to just toss it all, rather than continue to pretend that it is private in any way.

Been doing mostly Apache and APR work lately... not as much SVN as I'd like. Also a bunch of time with the DeltaV group and spec. It is finally getting wrapped up, but as with all things: when you give people a hard deadline, a bunch of suggestions come out of the woodwork.
[ of course, that is also to be expected. people will always want a stable spec before spending time with it. they are discouraged to spend time reviewing if they know their changes won't matter in the end. ]

Anyways... getting back to SVN now. Grabbed all the latest changes and updated mod_dav_svn to my latest changes over in Apache/mod_dav. It builds, so I'm starting to work on activities now. It doesn't seem like it will be difficult since it is essentially just recording a table which maps activity URIs to SVN_FS transaction ids.

Tonite, I also quickly hacked together an svnadmin proggie to create SVN_FS repositories. It spits out files, but there is still a ton missing from the FS. I hope that people can use the cmdline tool to create repositories to experiment with while they finish coding the FS. It seems rather log-jammed right now.

Spent some time up in Seattle last week. Visiting XMLFund, as usual. We covered a bunch of ground on the current technology in one of the portfolio companies, where things are going, what kinds of companies XMLFund will focus on, etc. While in Seattle, I stayed with my good friend Kanchan, and got to visit a number of other friends. It is quite nice to be able to go back periodically to visit.

End of last week, this weekend, I've also been moving the list.org and webdav.org websites to new hardware at John Viega's house. Our friend, Hellmonger, who is hosting the sites right now is leaving his job at the ISP, so we kinda need to move them :-). list.org will stay at John's (it's his site anyways), but webdav.org will move onto some dedicated hardware and then colocated for excellent connectivity and uptime. The hardware and colocation are being donated; I'll post to dav-announce when it happens.

Blah blah blah... I think that is all for now.

Just got back from Chicago yesterday. We had a great Subversion meeting (myself, JimB, KarlF, BenCS). Spent a few days just grinding through SVN design, and came out with a number of resolutions. Goodness.

The meeting was also for the express purpose of learning about change sets. We had Phil McCoog from HP there to tell us all about the change set gospel. Very cool. Unfortunately, SVN can't truly do change sets right now, and a redesign to enable that is out of the question. But we can take some of the particular user-features of change sets and implement them on top of SVN.

For the most part, change sets are very good for the merge problem. If you have a number of lines of development, or branches, and then want to merge changes from the separate line back into the main (or vice-versa), then the change set design makes it quite simple. There are also some excellent benefits for defect tracking and for determining differences between revs A and B. We'll have the latter done quite well, and hooking in with defect tracking is a separate issue in my mind (although we have some excellent properties within SVN to enable external defect systems to do this quite well).

Got a PS2 finally, last Friday. Of course, that also kinda sucked. Here I get a box, then bolt out of town the next day. When's a guy to play with the thing? :-) ... of course, I spent mucho time tonite with it. The PS2 kicks some serious butt. I've been playing a lot of SSX (a snowboarding game from EA). Rocks...

Chicago this time of year was a lot better than I expected. Possibly, it was just lucky. But it was only cold while I was there. No serious wind, no rain, no (falling) snow. Plenty of snow and ice on the ground, of course. Weird, as I was just thru Chicago at the end of December.

Small world! In 1996, when my company was purchased by MSFT, I met an awesome programmer named Kanchan Mitra. Over the next few years, I also became very good friends with him (great guy!), and at some point in there I learned he went to Oberlin college. No biggy, just a little factoid in the back of my head. Kanch also bought my house when I left Washington to return to the Bay Area... that was awesome -- we signed papers, ate Thai food, and drank champagne. What house purchase/sale could be better than that? In any case, I'm hanging in Chicago talking with Ben (Collins-Sussman) and he mentions that Karl and Jim also went to Oberlin. I do a bit of computation in my head with all of their ages and think "damn... they were probably there at the same time!" The next evening, I ask Karl whether he knows Kanch. Damn straight, he knows him. Ask Jim the next day, yup. (of course, Oberlin only has a few hundred people, but I don't know that at this point) But damn... they all know each other, and Karl/Jim both have a high regard for Kanch. Of course, Jim is upset that Kanch is at MSFT rather than contributing his talent to the OSS world, but hey... so it goes. ... Here I am working with the SVN guys for nearly nine months, and they know one of my best friends from the Seattle area.

Damn small world.

Coding-wise, I haven't got much done over the past couple weeks. The bulk of my efforts have been towards email discussions on the DeltaV design list and the (related) Subversion design. The big issues were atomic checkin and merging of activities, baselines, how to do copies into a working collection, linked version histories, etc.

Sigh. It's been a hell of a lot of email discussion. Not much time for real coding. Much of it is wrapping up, though, and the DeltaV spec is moving into last call. The mapping to Subversion is stabilizing, so the next few weeks will be the coding towards the second milestone.

On the home front, we went to Cleveland for the holidays. It was quite cool... a white Christmas. Left Cleveland and made it back to California before the big ol' blizzard smacked the east coast. Had some minor delays in Chicago, but not bad.

Cool part: got a whole bundle of Dragonball Z DVDs. Watching them right now, actually :-) Also got some books and some CDs. Fun stuff!

Gonna toodle off to Chicago on the 13th. Gonna hang with the other Subversion folks, and do some fun design work.

Had a party over the weekend. Didn't turn out too badly, but we just aren't seeing the same kind of attendence here as when we lived in Washington. I think it is my hypothetical "20 minute driving rule." If somebody has to drive more than 20 minutes to get somewhere (e.g. to your house), then they'll tend to punt. In Seattle, most people were within that, or close to. Here in the Bay Area, it can be a full-on hour just to get back/forth to SanFran from Palo Alto. So... we don't get SF people. [and yes: the reverse is true... we don't get up to SF all that much; but then again, our SF friends don't have a lot of parties. oy]

So... back to working on SVN. Adding tidbits, refining, etc the client side of the connection. Getting the "checkout" behavior working better -- fetching the right bits and dropping then into the WC. I've also been spending some time working on getting the "commit" operation in order. That actually doesn't seem to hard. The "update" stuff is going to be a bit of a bitch, though. There is a custom WebDAV REPORT to support that.

When the FS settles down a bit, then I'll go back in there and bring mod_dav_svn up to working order.

Well, all the APR(UTIL) stuff is finally settled down, and we released Apache 2.0a9 today. It's got the new directory structure and the additional APRUTIL dependency. It's time to get back to SVN coding and let the other stuff just lie around for a while.

Had a good weekend down in Palm Springs at the wedding. Not a whole lot to report there: some drinking, some eating, some bridge playing, and lots of great time with old friends. Just can't beat it.

85 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!