Older blog entries for Xorian (starting at number 17)

It seems that Larry McVoy is up to his old strong-arm tactics again, forcing another free software developer to stop his work.

I find this highly distasteful, of questionable legality, and more than a little worrying for my own future as my employer uses BitKeeper. (Luckily for me, it's a big company and I've been careful to never use it myself.)

Of course it's not too surprising as this is probably the most effective method they have to protect themselves against competition. As tridge discovered, BitKeeper is little more than a layer on top of the ancient SCCS storage format. It's not that it isn't a nice piece of software with some good GUIs, but I seriously doubt they could patent their technology.

25 Jul 2005 (updated 26 Jul 2005 at 18:40 UTC) »
Fun with core files

Sometimes I find myself questioning my tenacious pursuit of bugs. Usually this comes while I'm in the middle of trying to understand some really tricky failure. As I work on a system with a code base that's been around for over a decade, bugs these days tend to be the "tail of the dragon" kind which can take days or weeks to make sense of.

Most recently, I worked on a crash produced by a multi-threaded server during a test that ran for a little over 3 days before failing. It happened on a platform that probably doesn't get much use outside my testing environment (Sparc Linux). The crash was inside the memory allocator (in this case the Boehm garbage collector).

Any crash inside a memory allocator suggests some sort of memory corruption. Of course I had no idea whether the application-level code was responsible, but it seemed prudent to pursue it. After a day of digging around in gdb, I had understood the data structure where the memory allocator had gotten the pointer it was trying to de-reference. It was a linked list which, luckily, also had an array pointing to the individual members. (The list is sparse, so the "next" pointer of array element N might point to an element greater than N+1.)

The interesting thing at this point was that several of the elements out of this linked list had been completely zeroed. It was easy to tell this because of the overall data structure: the next/previous links of the zeroed elements were wrong, plus each one also contained its own array index as a field and those were zero as well.

I found that the zeroed elements were all pretty near each other in memory, which suggested that maybe the same thing had zeroed all of them. I wanted to find out just how much memory had been zeroed. Unfortunately, gdb seems to lack any sort of memory searching capabilities. I tried writing a little loop in gdb's command language to do the searching for me, but boy was it slow. (Oh, for those bygone days of Waldemar Horwat's TMON debugger!)

So I opened up my core file in hexedit and started looking around in it. This required using objdump(1) to get information from the core file on mappings from file offsets to virtual addresses and doing a little conversion. Not too much trouble, but not something I usually have to resort to with core files. Of course then I found that hexedit's search function only allowed me to search for a particular hex string, not to search for a non-zero byte. But with a few lines of C code I had a program to do that searching for me. (This is about the point where I started questioning the sanity of continuing. When the debugger isn't good enough, and looking at a core file in a hex editor isn't good enough, and you start writing your own code to read through core files... well maybe that's a good thing, but it gave me pause.)

So now, a good 2-3 days into my debugging, I find that a whopping 15.2M of memory had been zeroed. Luckily for me, it started and ended within the same memory region in the core file. It would have been much harder to follow if it had spanned multiple regions. I definitely felt like I was getting somewhere. I went back to hexedit and searched for a pointer to the beginning of this zeroed region, figuring that there was a good chance there would be one somewhere. Sure enough, I found two copies of the pointer I was looking for.

Then the bad news: the pointers were in the stack of another thread. Since this crash happened on a machine running Debian sarge, which for some reason still hasn't made the jump to NPTL, my core file only contained the registers for the thread which actually crashed. I had the contents of the other stack, but no way to get gdb to display the call stack or the contents of the locals and arguments. The gdb manual claims that you can display an arbitrary stack frame by passing one or more addresses to the "frame" command, but I couldn't get this to work in either gdb 6.3 or gdb 5.2. But I was not to be deterred! There was clearly some code zeroing memory it shouldn't have been, and I was going to find it.

So it was time to figure out how to decode Sparc stack frames. I Googled around for some info on Sparc stack frames, but I didn't find much. Instead I read some of the gdb source to see what it was doing. Eventually I managed to figure out enough about the stack frame layout to make some sense of it. (The output from gdb 6.3's "info frame" on the thread it would display was actually quite helpful with this.)

Now I needed to reconstruct the stack from the other thread. Luckily, I knew that the first few stack frames of each thread would be the same thread start functions because the Boehm garbage collector wraps thread creation so that it can find the stack of each thread. This made it easy to find the first few frames. Once I had those, it was back to hexedit to search for frame pointers and work my way down into the nested call frames. At each one I extracted the saved PC and used gdb's "info line" to map that back to a source line. Pretty soon I had at least a partial picture of what was going on in this thread.

It turned out that it was in a bucketed hash table template class, allocating and initializing some new buckets. Now I was finally getting somewhere. I still needed some way to get the arguments and local variables out of these stack frames. Since gdb refused to do it for me, I needed to figure out what offset from the frame pointer the variables I was interested in were stored at. To do that I ran some other code using the same class under the debugger and stopped in the same member functions. This made it easy to determine the offsets I needed. Then I went back to the other gdb session, and examined memory at these offsets from the frame pointers I had found and noted down.

At last, after something like 4+ days of debugging, I was getting close to some answers. It turned out that the hash table code was trying to allocate an array of 0x40000000 buckets. Each bucket was a pointer, which was 4 bytes. Multiply the number of elements by the element size, and you get a number that doesn't fit in a 32-bit size_t. So it actually allocated a heap block of... zero bytes. And then went to initialize all those pointers to NULL... which meant it was trying to zero the entire address space. Ooops.

This value (0x40000000) was, in fact, the clamped maximum number of allowed buckets. So that's obviously too large. This is one of those things you run into when you take software originally developed on 64-bit systems and move it to a 32-bit platform. Also, I'm a bit disturbed by the fact that allocating a heap block of zero bytes gives you back a non-NULL pointer. However, it seems that "malloc(0)" and "new void * [0]" in C++ both return a valid pointer.

Unfortunately, this is about as far back as I could take my debugging, which is somewhat unsatisfying as I really like to find root causes. In this case it seems as though somehow a floating-point division came up with the value 2210398208 for 1 divided by 0.5, at least as far as I can tell.

If the maximum hash table buckets were clamped to a smaller value and allocating a block of zero bytes returned NULL this all would have crashed much sooner, and in a way that would have made it easier to look at what else was going on. So I guess I'm going to have to be satisfied with those two buglets.

Small update

At some prodding from a couple people familiar with the code, I went back and dug up a few more details from the call frame that was initializing the hash table that caused all the trouble. (As long as I've already gone the extra mile, what's another short sprint, right?) Unfortunately, the data structure it was supposed to be copying was right in the middle of the memory region that got zeroed, which again leaves me stuck without enough information. However, the data structure corresponds to some on-disk data. Furthermore, the code path in question is only taken when the data is unmodified from what was paged in from disk. I was able to find enough information on the stack to find the corresponding file. That file confirms that the sizeHint parameter should indeed have been 1 in this case, which unfortunately brings me back to the mysterious result of 1/0.5=2210398208.

This warning is one of the funniest things I've seen in some time (details changed to protect the guilty):

src/foo.cc:1234: warning: `DETERMINISM_MODE'
   might be used uninitialized in this function

Well, I tried suggesting that the git people take a look at Vesta, but it seems it's probably too late.

But I'm not bitter or anything.

When I read about git, I was really surprised by how similar it is to the Vesta repository.

  • Linus said "git is not an SCM. it's a filesystem designed to host an SCM". The Vesta repository is a filesystem. It has a versioning system and a builder built on top of it.

  • git stores complete copies of files, but only one copy of each file, just like Vesta.

  • git's backing store is indexed by a hash of the file contents, just like Vesta's.

Coming off BK, I would have expected Linus to go in a direction closer to Darcs or Codeville.

Of course as a developer working in the same space, it's frustrating to watch so much effort going into starting yet another new version control system. Even more so when it seems so close to what Vesta's been doing for almost a decade.

1 Mar 2005 (updated 1 Mar 2005 at 22:32 UTC) »

My Mac mini finally arrived last Thursday. What were the first things I did with it? Enable incoming ssh, install the development tools, and modify the code needed to get the Vesta repository mounting on Darwin. This only required minor changes over the FreeBSD version I wrote earlier. It's getting easier with each new OS.

Unfortunately, as I've long pledged to port to Solaris first, I may not make a lot of progress on the MacOS X port for a while.

CodeCon musings

Aside from being a lot of fun and exposing me to new work people are doing, CodeCon gave me the opportunity to have interesting conversations about Vesta and consider how it relates to other projects.

While listening to Walter Landry's presentation on ArX, I compiled a list of a few good things about Vesta which I haven't included in presentations about it before (some stolen directly from Walter's presentation as they apply to Vesta as well):

  • Disconnected hacking. You can make a local branch when not connected to the network. [I do this on my laptop when traveling, and this is in fact how I was working when visiting Microsoft Research.]
  • Strangers can make changes without the main developers giving permission. With only read access to a central repository, you can make first class branches in your own repository. [This is related to the previous point, in that it all happens locally after replicating whatever version you're basing your changes on.]
  • It doesn't use timestamps for anything. Like many modern build systems, Vesta does not use timestamps for dependencies. Unlike most modern competitors, it doesn't depend on timestamps as a versioning system either. Since it is its own filesystem it knows whether you have changed a file.
  • It doesn't intermix its data with your source files. Most versioning systems store some meta-data about what you're versioning in files/directories intermixed with your sources. Because Vesta is its own filesystem it can store that meta-data in special directory attributes. This keeps its data out of your way.

I talked with Ross Cohen (who works on Codeville) about merge algorithms. He told me stories of crazy repeated merge cases that he thought would never come up in practice, and then did. I asked him if he'd be offended if I ripped off his merge code for the pluggable merge architecture I've been designing, and he said he wouldn't. (I'm trying to avoid getting into the business of writing a new merge algorithm.)

I talked to Nick Mathewson and Roger Dingledine from the Free Haven project about securing Vesta. They suggested I write an RFC-style protocol spec if I want anyone with a security background to help. They also confirmed my concern that the mastership transfer protocol is the most problematic part, as it uses TCP connections in both directions between the peer repositories. To a lesser extent, replicating from a satellite repository back to a central one when checking in after a remote checkout has the same problem. If we could find a way to make these active connections passive, it would also help people behind firewalls.

Kevin Burton from rojo.com recommended including more information in the RSS feed of latest versions in the pub.vestasys.org repository. I've been having my feed generator trim the checkin comments, but he said the RSS client should do that. He also suggested including a diff in the feed.

Zooko and I talked briefly about merging. Specifically we talked about how

"smart" merge algorithms need more information than a common ancestor and two versions descended from it. They typically take into account more about the steps between the common ancestor and those two versions. I said that I thought it should be possible to use the trigger mechanism I've been working on to record the kind of extra information such algorithms would need. He contended that without spending time using and studying a system with smart merging, I won't know quite how to design to enable it. While I have read through the Darcs "theory of patches", I don't see myself having time to spend really getting a lot of experience with another system.

Andy Iverson (?) and I talked about Vesta's approach of storing complete copies of all versions. He brought up the example of how small a BitKeeper repository is with the entire Linux kernel history. I asked "Is having that entire history really interesting?" He contended that it is, bringing up the example of searching through the history to find when some feature/variable/function was introduced. This probably has to do with the fact that the design decision to store complete copies was made early in Vesta's design (in the late '80s), before open source really took off. Basically the argument was "disk is cheap, programmers aren't typing any faster." However, open source projects can scale to a much larger number of programmers making a much larger number of changes. The Vesta developers couldn't have predicted this effect when they made that design decision. We could do something to add compression of versions, but I'm wary of doing that for performance reasons, at least at the moment. One of Vesta's selling points is O(1) checkout/checkin/branch and access to existing versions, which we would lose with compression. Also, adding compression right now would put load on a central resource (the repository server). I have some ideas on splitting up the system in ways that would make it possible to distribute this load, but I don't expect to make progress on that in the immediate future. Lastly, the Linux kernel clearly has a higher number of contributors (and probably a higher rate of change) than most projects, so maybe this is only an issue in extreme cases. I should probably spend some time measuring the storage requirements for the history of different free software projects.

12 Feb 2005 (updated 13 Feb 2005 at 23:51 UTC) »

While at CodeCon, wmf told me about Electric Cloud.

I've been reading through their technology description, and I have to say it sounds like they've just re-implemented a small portion of Vesta. Their dependency detection mechanism sounds almost exactly like Vesta's (and for that matter ClearCase's). The only thing which sounds new to me is that they fire off many build pieces in parallel without knowing whether they're doing them in the right order. If it turns out that they did some build steps in the wrong order, they re-execute those portions. This approach sounds dumb to me. Why not just do it in the right order every time (which is what Vesta does)?

Even worse is that they call their tool "plug compatible with make". So they're working really hard trying to take an old, known broken model and beat it into something scalable without redesigning it at all. Great idea.

Now how to I tell the USPTO that I have prior art for their "patent-pending automatic conflict detection and correction technology"...

On my last day visitng Microsoft Research, I wore my O'Really "Writing Word Macro Viruses" t-shirt. I was highly amused, and nobody at MS seemed to notice.

8 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!