Lots of interesting threads to respond to. I was going to write more tonight, but instead I spent over three hours talking with David McCusker on the phone. It was well worth it, as he was able to give me some new ideas about trees in Fitz, and I explained to him my trick for representing arbitrary trees in a btree-like structure.
mwh wonders about terminology. I guess there are people who consider reference counting to be a form of garbage collection, but I tend to think of them as two different approaches to the same problem. To me, a garbage collector is something that can collect garbage even when it has cyclic references. I guess it all depends on who is master :)
By this criterion, Python 2's "cycle detector" is a real garbage collector, but there is also reference counting. When a reference count goes to zero, a structure can be freed immediately, which is not true of "pure" garbage collected runtimes. In fact, if you're careful not to create circular references, you might never need to invoke the cycle detector.
Essentially, the hack in Python is to use reference counts to determine the roots. In any garbage collected runtime, finding the roots is a challenging problem. The problem is even harder when code is in C, because a lot of the time, you'll want references stored on the stack to be considered roots. Python counts the number of references in to each object, and if this doesn't match the reference count, considers the object to be a root.
I haven't done benchmarking, but my intuition tells me that Python's approach is one of the poorest performing. For one, you have 4 bytes of overhead per object to store the reference count. For two, as mwh points out, just forgetting a reference pulls the object into the cache. For three, bumping reference counts causes memory traffic. In many cases, you'll get write traffic to main memory even when you're just doing a read-only traversal. For four, I'm sure that Python's cycle detector is slower than most mark-sweep collectors, because of all the extra accounting it has to do.
mwh guesses that reference counting is better performing than stop and copy. Actually, stop and copy tends to be one of the best performing approaches. Allocation is very cheap, in the common case just advancing a free pointer. On collection, only active objects have any cost. In most systems, the majority of objects have very short lifetimes. For these, the cost of freeing is essentially zero. Finally, stop and copy often compacts objects, increasing spatial locality, so you may need to pull fewer cache lines from main memory to access an structure. But note that Hans Boehm doesn't agree with me. See his slides on GC myths, for example, and this comparison of asymptotic complexity.
The GC FAQ looks like a pretty good reference.
mbp: Java's example is not particularly helpful for what I'm trying to do. My goal is to have a library that works well when bound to whatever runtime the client wants. The Java philosophy, by contrast, is to nail down a runtime and force all components of a system to conform to it. In many cases, that's actually a good thing. It minimizes the impedance mismatches between modules. Further, Java's runtime is actually a pretty good one. A huge amount of effort has obviously gone into tuning its performance, and I'm impressed that they've gotten garbage collection in highly multithreaded environments to work so well. Java's runtime is undoubtedly better than most hand-rolled stuff.
I'm not convinced by your claim that Java is agile with respect to garbage collection discipline. I consider the ability to count, say, only forward references in a doubly linked list to be a minimum requirement for any decent reference counted discipline. Where is that in Java? I know that embedded Java folks are having success with coding patterns that allocate very little memory, but at the cost of a more constrained use of the language. In particular, if you use a library written for general Java in such an environment, it won't work well. Conversely, general Java users probably won't much like libraries designed for embedded stuff, because they're used to a dynamic style.
I have things to say about proofs vs testing, debuggability of runtimes, and media coverage of Hatfill, but they'll all have to wait for another day.