Older blog entries for pfdietz (starting at number 22)

I've spent the last 12 days cataloguing gcl's failures in ansi-tests. Having gone through all 1900+ failures, I've found there appear to be about 284 distinct bugs (plus any bugs not detected by the tests or masked in the test suite by other failures.) The biggest improvements in failure numbers would come from implementing READTABLE-CASE and the proper return value for FORMATTER functions; the deepest issues probably involve the type system (GENERIC-FUNCTION should be a subtype of FUNCTION, for example, and FUNCTION should not intersect the type CONS.)

Camm has been busy this summer with improvements to GNU Common Lisp. GCL is getting more efficient; on the ACL2 benchmark it's expanding its lead over the competing lisps, with more improvements coming (granted, ACL2 may be tuned for GCL, and vice versa). CONS cells have been shrunk to two words from three, reducing memory usage by 1/3, and there's a new fixnum scheme that reduce memory accesses. There are also lots of type propagation enhancements and type-driven optimizations. There's even progress on ANSI compliance, with bug fixes coming in.

GCL has this nice feature (due to the late Prof. Schelter) where it can automatically generate type proclaims for functions from type propagation on the program as a whole. GCL generates better code with proclamations, so these can really help. If you're benchmarking an application on GCL try out this feature.

Heh.

Any paper ending in a limerick has to be good.

A former coworker of mine has a blog that often talks about programming language ideas. He points out that Dussad's call for a statically typed Lisp-like language could be met by ISI's STELLA.

I've written, as part of the random type prop tester, a generic function named make-random-type-containing that takes a single argument and produces a random type to which that object belongs. Right now it makes awkward use of call-next-method to link together methods for classes of increasing generality, but it's hard to make the probabilities come out right, particularly as new methods are added.

I've realized I should instead have defined this method combination:

 (defun car-is-positive-integer (x)
  (typep (car x) '(integer 1)))

(define-method-combination randomized nil ((method-list car-is-positive-integer)) (assert method-list) (let* ((qualifiers (mapcar #'method-qualifiers method-list)) (weights (mapcar #'car qualifiers)) (clauses (mapcar #'(lambda (weight method) `(,weight (call-method ,method))) weights method-list))) `(loop (catch 'fail (return (rcase ,@clauses))))))

What this does is use integer method qualifiers as weights. A method is chosen with probability proportional to its weight (by the macro rcase, not here defined). The catch allows methods to abort by throwing to fail; when that happens another method is selected at random.

Hash Consing is an interesting non-standard idea that's been around for a long time, before Common Lisp. The idea is to make lisp's equal and eql be identical by storing cons cells in a hash table. When (cons x y) is computed, the two arguments are used as keys into the hash table. If a cons of these two values has already been computed, that cons cell is reused.

This idea is not consistent with the cons in Common Lisp, so it would either require a nonstandard lisp or an extension using something other than cons as the constructor (and, in which case, we may want to clone the other cons-related functions as well.) Hash cons cells must clearly be immutable, and only contain immutable data, so the idea is related to the immutable data types Baker mentioned in his ILC 2005 talk.

What's the benefit? Applications that manipulate mathematical formulas are a disproportionate share of Lisp's success stories (Macsyma/Maxima, Axiom, ACL2, various other theorem provers). It's nice if you can compare formulas for structural identity in O(1) time. I understand ACL2 does run on a hash-consed variant of gcl.

There's a variant of hash consing I'll call 'lazy hash consing'. In this scheme, eqlity of equal cons cells is established only when they are compared. Here's how it works. Let's suppose we're computing equal on two places, say x and y. If they are eql, stop. Otherwise, if they are cons cells, recursively compute equality (using hash codes stored in the cells to recognize inequality of subtrees with high probability). If x and y are equal, then we have two trees that are equal but not eq. What we do then is either (setf x y) or (setf y x). If we're using a generational collector, we choose so that the pointer that is changed is moved to point to the same generation as it pointed to before, or older (this might be as easy as comparing the two pointers, assuming newer generations are stored in progressively larger addresses on the heap). This means we don't need a write barrier.

The recursive equal calls on car/cdr of the cons cells in the trees would be done using this same algorithm, so after one sweep through two disjoint trees we've mostly collapsed the two trees into one (and many cells will likely be collectable.)

This scheme is nice in that it doesn't need a hash table, just hash values in each cons cells. Accidental collisions will be rare (assuming a good hash function), so inequality can be determined in O(1) time. It's also likely that older versions of trees will tend to have more shared references, so pointers should quickly be made to point to their final destinations.

A few thoughts on ILC 2005

Overall it was an enjoyable conference. There were some good papers, some duds, and some controversy. It was very good to see and talk to people who I knew only by name. Attendance appeared to be up from ILC 2003. ALU is in no danger of dying, with a nice surplus.

My talk went ok, but I (as always) underestimated how much time it would take, and how many questions I would get. Fortunately, I had designed the talk so the last part was optional, so I omitted it. I got positive comments, and William Clinger (who gave an interesting talk on Common Larceny, a Scheme-on-CLR implementation) liked it.

Moore's presentation on ACL2 was a big hit, and (unlike many of the other invited speakers) he actually submitted a full paper for the proceedings. I'm particularly gratified by ACL2 because its standard delivery platform is GCL (although it does run on other lisps, it gets its best performance on GCL.)

Now, about those last three presentations. A lot's already been said in other blogs, so I'll try not to repeat things I agree with.

Two presenters (Baker and McCarthy) complained about standardization. I wonder if the complaint isn't that standardization made it more difficult to get grants for new lisp implementations. Didn't the CMU CL effort have their funding dry up around the time the standard was being finalized?

Integration with CLR seems pointless if Common Lisp has to change too much. Move the language too much and one loses the community and software infrastructure behind it. One might as well start over and not call it Lisp.

The desire for a statically typed Common Lisp came up. But don't we already have this? Lisps can and do perform type inference. Many lisps propagate types inside functions, and GCL can generate function type declaims for all functions (and this is one trick used to speed up ACL2, I believe.) If the user wants to add manual type declarations, there's nothing in Lisp stopping him. Maybe it's just a matter of allowing the user to specify that the compiler should break on warnings?

I think many of the proposed extensions to CL are things that are either already done, could be done without changing the standard, or require only extensions to the standard.

Baker made a more interesting point when he complained that software didn't have the huge per-seat cost of chip design software packages. If chip designers found automated help useful enough to pay that much, then why didn't software developers? The reason, I think, is networking effects, which strongly encourages one to use the same development environment as everyone else. It's a point in Common Lisp's favor that it has survived (and, in its small niche, may even be thriving) even fighting against this tide.

Costly (but worthwhile) software development tools will impede the loss of software jobs to the low cost markets. From what I've seen of development there, their capital budgets are constrained; only in the last decade or so has the cost of equiping a developer there fallen low enough to make a good business case for offshoring.

What does that have to do with Lisp? Can we do something to Lisp, or Lisp implementations, so that expensive development platforms make more economic sense? I have some ideas here I'll expand on some other time.

I sent the pdf of the ILC 2005 paper off last week, on the deadline day. Now I just need to register and reserve a rental car. Oh, and make the slides and practice the talk.

The gcl ansi-tests suite is close to done. I'm scanning back through the spec, filling in parts I missed. Compiler macros, various declarations. DEFINE-SETF-EXPANDER needs tests.

I've also started on a new Lisp testing project. It'll be located below the ansi-tests directory in the gcl tree, but it's not part of that test suite. I want to test that most undefined situations in the spec signal errors in safe code. The purpose is to make safe code truly safe (no buffer overflows or other potential security problems). The exceptional situations in the standard are just inadequate for this.

I've added a new feature to the RT module in gcl ansi-tests. It can now do 'extended random regression' testing. This involves repeatedly executing tests drawn randomly (with replacement) from the set of tests that normally pass. It stresses the system, looking for dependencies that the tests in their original order didn't exercise, for memory leaks, race conditions, and so on.

Bugs found by ERR can be very hard to track down, so it's not suitable for early debugging. On the other hand, it's been shown (see page 7) that ERR can expose defects that are very difficult to find otherwise.

The first thing I found when I ran these tests were many order dependencies in the test suite, particularly in package-related tests. I think they're all fixed now. SBCL has been running ERR for about ten hours on my machine with no failures or leaks. Clisp, on the other hand, showed an unexpected failure in one of the DEFCLASS tests. I'll try to track that down by doing ERR on limited subsets of the test suite.

I've been extended the random type prop tester for Common Lisp. The type arguments can be functions, which generate types that are dependent on the values of the preceeding arguments. This is useful for generating valid inputs to operators like AREF (the array indices have to be in range.) I've also added options to specify that the arguments should be replicated before use (necessary for testing destructive operators like RPLACA or NCONC).

I'm adding tested based on this infrastructure to the ansi-tests directory, but they're not going to be invoked directly in a normal run of the test suite, since they take a while to run, particularly on gcl.

Tests on PARSE-INTEGER brought up the question: what is (vector (member #\0 ... #\9))? Is it a subtype of STRING? The spec seems to say yes, but this is getting close to requiring that UPGRADED-ARRAY-ELEMENT-TYPE works on all types, not just those recognized by SUBTYPEP, and that's not possible in general unless U-A-E-T is an identity function (otherwise, it's undecidable.) I think the CL spec needs some work here.

13 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!