30 Jun 2005 pfdietz   » (Master)

Hash Consing is an interesting non-standard idea that's been around for a long time, before Common Lisp. The idea is to make lisp's equal and eql be identical by storing cons cells in a hash table. When (cons x y) is computed, the two arguments are used as keys into the hash table. If a cons of these two values has already been computed, that cons cell is reused.

This idea is not consistent with the cons in Common Lisp, so it would either require a nonstandard lisp or an extension using something other than cons as the constructor (and, in which case, we may want to clone the other cons-related functions as well.) Hash cons cells must clearly be immutable, and only contain immutable data, so the idea is related to the immutable data types Baker mentioned in his ILC 2005 talk.

What's the benefit? Applications that manipulate mathematical formulas are a disproportionate share of Lisp's success stories (Macsyma/Maxima, Axiom, ACL2, various other theorem provers). It's nice if you can compare formulas for structural identity in O(1) time. I understand ACL2 does run on a hash-consed variant of gcl.

There's a variant of hash consing I'll call 'lazy hash consing'. In this scheme, eqlity of equal cons cells is established only when they are compared. Here's how it works. Let's suppose we're computing equal on two places, say x and y. If they are eql, stop. Otherwise, if they are cons cells, recursively compute equality (using hash codes stored in the cells to recognize inequality of subtrees with high probability). If x and y are equal, then we have two trees that are equal but not eq. What we do then is either (setf x y) or (setf y x). If we're using a generational collector, we choose so that the pointer that is changed is moved to point to the same generation as it pointed to before, or older (this might be as easy as comparing the two pointers, assuming newer generations are stored in progressively larger addresses on the heap). This means we don't need a write barrier.

The recursive equal calls on car/cdr of the cons cells in the trees would be done using this same algorithm, so after one sweep through two disjoint trees we've mostly collapsed the two trees into one (and many cells will likely be collectable.)

This scheme is nice in that it doesn't need a hash table, just hash values in each cons cells. Accidental collisions will be rare (assuming a good hash function), so inequality can be determined in O(1) time. It's also likely that older versions of trees will tend to have more shared references, so pointers should quickly be made to point to their final destinations.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!