Older blog entries for graydon (starting at number 106)

testing tools

movement asks if free software testing tools are "wild and wooly". I would say some, but not all. I think the unit test things are as good as they need to be; unit testing is more about practise than technology anyways. there's a *Unit library for just about everything now, which is nice.

high-level supervisory and tracking stuff -- say aegis or bugzilla -- is pretty good. maybe if I'm lucky someday I'll be able to say that about monotone too. I think we all know that test cases are the core of the "science" which makes up computer science; the muck you have to slog through to make your software strong against reality.

the mid-level supervisory stuff, however, is weak. dejagnu is Not My Idea of Good (sorry rob) and most people seem content to just cobble together shell or python scripts when they're fed up with expect. testing tools which do automatic breadth or boundary calculations, which estimate and control test-run time, or which do much in the way of environment control and capture, seem disappointingly rare.

one thing which bothers me about test infrastructure is that the posix 1003.3 output format doesn't require (or doesn't often seem to be interpreted as requiring) unique identifiers for each assertion tested. as a result, people wind up measuring not the set of tests passed or failed, but the number. this is wrong in an important way: if I make some fixes and go from 438 failures to 245, it is not at all clear whether I am "193 tests better off" or whether I've fixed 438 and regressed on 245. constructing a QA system on this sort of measurement will be broken by design. it's a bit frustrating. the proper regression relationship is subset, and for that tests need identity.

option processing

I read in innumeracy that people generally mis-perceive coincidence, and that the expected frequency of coincidences (amongst all possible things that may happen to you every day) is pretty high. that may be, but it's still creepy. one day, I'm sitting on a go bus sketching out a design for a C++ option processing library. next day, it gets mentionned here. then I open up gnus and boost announces a formal review period for an option processing library. jeepers. I can feel the axons of the pan-network subcortex creeping into my head just thinking about it.

29 Apr 2003 (updated 29 Apr 2003 at 18:34 UTC) »
lexical reproduction

I have an ongoing thesis at the moment which I'm exploring in the programming language literature, which is that programming language features do well (all other things being equal) when they eliminate either distant or dynamic state and replace it with either close or lexical state. the underlying point being that we may favour language features that facilitate copying and modifying small bits of code -- fragments which work in their new context -- as a fundamental programming activity.

exceptions

so along these lines I was thinking about exception handling mechanisms the other day. exceptions are a dynamic control transfer system, or else they are an "error recovery" system, depending on who you ask. I've been browsing around LtU's recent link to crash-only software, the restart system in common lisp, and the checked-vs-unchecked debate in java.

the static checking part really got me curious, because there's a clearly dynamic bit of state that the java designers have tried to patch into the lexical structure of a program, and have mostly failed. nobody likes checked exceptions because the signature of a method needs an exception specification covering the union of all possible dynamic contexts it will find itself in, which is almost always just "Exception". so there's no point.

I think what lexical exception checking -- if you believe in it at all -- ought to do is specify which exceptions will not be thrown from a method. this would at least let the compiler tell you when a particular catch clause is dead code -- probably an error on your part -- and erase it, along with the associated try context setup. that's something, but not much; you may still fail to catch the "right" exception. but at least it works with an open set of possible thrown exceptions, which is the practical fact, most of the time.

but I wonder if this thinking could be combined with crash-only design, so that you have only one throwing-like statement called crash and a conservative type of signature which says whether your method confines crashes (those without the indicator are assumed to propagate crashes). the confining ones can be checked by a compiler; the "confines crashes" bit can be propagated based on static knowledge between caller and callee, so a method which calls only crash-confining methods (and doesn't do any dangerous pointer fiddling or crashable arithmetic) is itself crash-confining; finding the transfer point during a crash is easier; and the call setup would (I think) be a little shorter on crash handling and crash propagating code. you'd still need destructors, of course, and for diagnostic sake you'd still want to collect stack traces and error descriptors. but I think that's a separate issue: using exceptions as cheap debugging vs. using exceptions to help a program keep running in the face of errors are two different things.

unfortunately I don't think you can quite implement this scheme in java or C++ as it stands. they both treat exceptions a little wrong; java will not statically check for handling of unchecked exceptions, hence the name, and C++ ignores (in a horribly fatal way) everything undeclared, unless you declare nothing. both are a bit broken. ah well.

diffs

pfremy writes that diff's output is unintuitive. this may be the case, but his explanation of the two algorithms used is not right.

diff uses a myers LCS method, a careful dynamic programming construct. it is not the fastest known variant, but it is O(ND) time, where D is the size of the minimal edit script, and it finds that edit script. it also has a very important property that you can implement a version which eats only linear space.

python's difflib uses a "junk-sensitive" recursive greedy matcher (documented here) which is apparantly similar in spirit to the Ratcliff-Oberhelp "gestalt pattern matching" algorithm, but with more desirable running time (R-O is quite awful). it does not find minimal edit scripts, and it might -- or might not -- run at competitive speed and space with myers stuff. I can't find any clear analysis.

anyways, there is a lot of literature on string matching. it's one of the huge problems you can lose the rest of your life in. it is certainly possible to improve diff -- I don't want to put you off the cause -- but be sure to do some large tests and background reading before discounting the decisions made in implementing diff.

a couple years back I complained about the state of version control systems. well, about a year ago some friends and I started sketching out ideas for a new one. this week I released a preliminary copy of it. we'll see if it does well. it's a little different, but I think it's got potential.

meanwhile, it's spring! go outside for a walk!

concurrency abstractions

multi-threading seems awkward (and often is) because it makes very pessimistic assumptions about concurrency needs, eg:

  • any co-routine might need any/all global resources
  • any co-routine's state == an entire stack segment
  • any instruction is an equally likely preemption point

while making very optimistic assumptions about concurrency uses, eg:

  • threads always co-operate correctly on global resources they share (memory, signals, fds)
  • the sharing mechanisms (say, mutexes) are cheap
  • threads always re-schedule optimally, at zero cost

when people imply that "state machines are better than threads", they are broadly saying that if you analyze real-world concurrency needs and uses, you will more likely find that:

  • a small set of global resources need sharing
  • a small set of values makes up a co-routine's state
  • a small set of instructions are reasonable preemption points

therefore in these cases the mechanism of "full" multi-threading can be seen to be redundant distraction, providing little advantage over managing some small sets of values by hand. but moreover:

  • threads often co-operate incorrectly, due to programmer error, and worse: non-deterministically
  • lock contention and context switching is expensive, especially with increasing cache-dependence
  • a general thread scheduler doesn't always choose a very good schedule for your threads; your application may well know better

so it can be easy to see threads as worse than merely irrelevant, but in fact truly harmful to your program.

this is not to suggest that multi-threading cannot ever be put to good use. but rather that its default assumptions lead to strangely mismatched programs, which have too many unexpected failure cases associated with the optimistic assumptions, and too many ways to make simple tasks complex in order to meet the pessimistic assumptions.

there have, however, been some interesting systems developed which mix some sort of threads with stronger-than-default assumptions (either in language extensions or programming frameworks) to yield good results. see for example:

technical mutterings

I've heard a lot of people muttering about end-to-end as a design principle of the internet. And they're right: it is important for the internet's protocols to be transparent and use-agnostic. But I seldom see people doing anything about it; we treat the shameful protocols we have as untouchable, and just bolt stuff on without revisiting the design. Network effects seem to ensure that beyond a certain adoption threshold, you simply can't change poor protocols.

I am therefore happy to see two new-ish, and pleasantly simple and transparent protocols being muttered about, in areas which aren't yet completely pinned down: internet backplane protocol and factotum.

chalst

first: I had not previously read hehner's "beautifying godel" paper, but now that you mention it I must say that I disagree with you that it is the "work of an incompetent". I believe the paper illustrates the concept, and does so in a way which is reasonably terse and accessible to a computer programmer, and pleasantly free of the advanced latex, infinite sequent trees, ambiguous notations and ridiculous term encoding which plagued the "logicians presentation" I received in stephen cook's class. I seem to recall him even saying as much during lecture: that a interpretation-oriented presentation, over ascii strings, would probably be much easier to follow. what makes you think of it as incompetence?

second: I certainly do not like working in TCL, since it is so unsafe and more than a little awkward at times; all I'm arguing here is that string manipulation is effectively as powerful as sexp-splicing. I don't see any refutation of this in bawden's paper, merely a preference for the simplicity and "synergy" of the lisp combination. by all means, use a lisp. it's a good system, definitely. personally I prefer something even tighter: camlp4 statically typechecks my metaprograms too. are there any free lisps which do this? I haven't found them. seriously, I'd like such a thing, I just don't have one.

third: yes, camlp4 is a little "fixated" on "source vs. object code"; more accurately you could say that it's "fixated" on pre-run-time static verifications, since you can certainly ship source-only ocaml programs that compile at load-time. lisp's approach doesn't confuse me, it just isn't terribly safe. I happen to like static verifications, and am happy to trade runtime metaprogramming, a feature I do not often want, and freqently want to prohibit, for statically checked compile-time metaprogramming. if you could point me to a good, free, portable lisp with a native code compiler which does typechecking of its metaprograms and their results, and permits custom lexers for the metaprogram input, I'd be happy to investigate it. imho that is ocaml.

tapioca

jdybnis suggests that tests are worse than proofs; by failing to terminate they may make themselves hard to verify since their traces are infinite. I dispute this, since a test is often written within a testing framework which sets a timer and aborts the test after some time. I think tests, test-traces, and proofs are all finite formal objects, simply of varying scope.

mbp raises interesting points. first, that gcc and linux do not always fit common "specifications" of a C compiler and a POSIX kernel. this is worth complaining about, but is not a flaw in a proof of some formal properties of your distcc program, which treats the functions of kernel and compiler as axioms. as mslicker aptly quotes chuck moore: internal consistency can be proved, but compatibility can only be tested.

axiom testing is a different sort of "test" than I was talking about. surely one needs to test that the assumptions going into a proof hold in a particular environment, and those tests should indeed follow the scientific method of attempted falsification.

but I maintain that it's poor use of time to test a program in order to verify some static property you have a good formal theory of. consider gcc: running its larger testsuites takes hours-to-days. in all combinations of host * target arch, with all combinations of configuration flags, it would not finish before you die. and yet verifying the formal property that gcc is a syntactically valid C program, or that it is type-sound, or that every symbol used by a compilation unit has a definition in another, takes only a small factor more time than the i/o required to read it off disk. why is this?

it is because, unlike the (formal) property of "correct translation from C to machine code for architecture X", we have spent time and effort on some of the simpler properties (does-parse, is-typesound, uses-are-defined), and in the process developed the tools to state and verify them statically. applying gcc to other inputs ad nauseum, as a "test" that gcc is a syntactically legal C program, is a total waste of time. you check the static formal property once when the program text changes, and that's it. any more would only tell you that an assumption from the environment is wrong (disk i/o error?), not that the properties have changed.

davidw states that your competition will always outrun you if you bother to stop for proofs. this depends on how ambitious your proofs are; as I suggest above I think there are lots of properties we already prove, and many more waiting behind the scenes which we will have the opportunity to prove in the near future. will we reach a day when we can prove everything interesting about programs? certainly not. such a statement probably has no meaning. but it is instructive, from a business perspective, to consider how much time you spend running tech support, tracking bugs, debugging, issuing errata, etc. and compare it to the investment in various proof techniques. on several fronts, notably with customers who have deep pockets and long memories, I think the economics may favour "more proofs".

finally, mbp encourages me to try distcc, which is a very good suggestion; it's saved my life several times (though it only speeds up verification of formal properties, not the testsuites). fwiw, I'm your anonymous happy customer from june 4th :)

pudding

mbp speaks of the inability to prove that a program is free from bugs. this is broadly asserted, yet I feel a little twinge of annoyance every time I hear it, perhaps because it is so imprecise.

a bug is a subset of machine state space which you do not want to be reachable by implication from your program's text. a test is a single path through machine state space, either leading into a bug region (a test you want to see fail) or avoiding bug regions (a test you want to see pass).

there are far more possible tests than you can ever write or run, so you must choose "good" paths towards "important" bugs. the problem of a test writer is to determine which areas are important, and it can require deep insight into what the program is intended for.

but note: a bug, a test, and a program are all completely formal objects from this viewpoint.

a proof, on the other hand, is another formal object in a much larger "space" (a linguistic space in which machine state spaces, and some parts of your preferred maths or logics are basic terms). the sort of proof you are interested in is one which relates your program's text residing in memory at one point, to a (good or bad) region of machine state space which is implied by the program, via a logic whose rules you like.

any test can be translated into a proof in a silly logic easily: the proof is simply the trace of your processor executing your program's code on your test's input, and the logic is one in which each machine transition that happened is an axiom. but that proof is boring.

an interesting proof is one which is much smaller, when written down, than the sum of all the tests which you would need to write to fill the machine state space bounded by the proof. in this sense, I really believe Chaitin is onto something when he talks about proofs as nothing more than a form of "higher order data compression", and the value of a given formal system as the amount of compression it commonly admits over interesting data.

so, getting back to mbp's comment, certainly you can produce a very large set of (formal) bugs which nobody's compressed inside a proof yet, for any program you care to mention. but I do not think this means that all, or even a sizeable majority of those bugs will admit "no further compression" if the author puts their mind to it.

raph's suggestion that we design programs "the way we'd want to prove things about them" is, seen in this light, a suggestion that we design programs "in a way which admits a lot of compression". I think this fact is at the heart of the programmer aesthetic for "small and simple" programs. "larger" is often equivalent to "less provable", or similarly the feeling that any test you run explores too little of the state space to be helpful. you lose any confidence you have that a large system will do what you want, unless you can carve up the state space with type systems, modules, processes, etc.

as a final note: people often try to express test quality in terms of "code coverage", but this is only slightly useful, and less so as the quality of the program being tested improves. good programs parameterize much of their behavior on their input. their machine state space massively outweighs the size of their program text, so you learn very little about exercizable machine states by knowing that all the code was exercized.

refinements on a theme

if you set out to prove "predicates" about a program, there are two (common) ways to go about it.

the first way involves writing a program, and then either in the language of the program or some other "more logical" language writing down some formula about the program and trying to construct a proof. proofs performed in this order are constructed by repeatedly reducing the terms of the proof towards axioms. you are given a list of transformations on the formula which are permitted, with the understanding that rewriting a formula in some sense preserves "truth", or at least logical implication. once you arrive at a formulation which is all axioms or instances of abstract axioms, you are done your proof. sometimes you have tool support to automate reducing the tedious parts of your formulas to axioms. formal systems which are intended to work this way include ACL2, Coq, HOL, NuPRL, Isabelle, Twelf, and many other logic and meta-logic frameworks.

the second way involves writing a formula expressing the conjunction of facts you eventually want your program to satisfy, and transforming that formula into the program by successive refinements (rewrites). you are given, like in the first case, a list of the sorts of syntactic transformations which constitute a legal refinement, with the understanding that a refinement of a term "means the same thing", only the refinement is "more concrete" than the refined. when you have nothing left but terms in your abstract machine language, you have completed the refinement and you can probably run the resulting program. some times you have tool support to automate performing tedious refinements of formulas into machine code. formal systems which are intended to work this way include Z, VDM, RAISE, B Method, Refinement Calculus, and Hehner's "practical theory of programming".

typically the underlying logics of members of either such family permit their use in one approach or the other, but the tools and logics may display the preference of their creators.

formalities and formalisms

one way of viewing programs is as "active" objects, about which certian assertions may hold from time to time. in this view, people say things like "what does this program do?" the program text is viewed as suspicious, vulgar; to do "real" reasoning, you need to relate the program text to something clean like mathematics or formal logic where there are many good theorems (never mind the ambiguity and duplication in their notations).

another way of viewing programs is to say that programs do nothing; computers do things and programs are simply formal suggestions to the computer. a program in a high level language is a formal suggestion to another program (a compiler, expressed in a low-level language such as machine code) about a sequence of i/o operations which may result in the production of another program in a low level language. in all cases, however, a program is a completely static entity; a single formula with rules which may apply to describe legal transformations of it.

in this latter view, the most important thing a program suggests is a relationship between pre- and post- states of a computer. this "large" relationship is stated as a sequence of instructions, but each instruction suggests at a fundamental level nothing more than a relationship of the pre- and post- states of a computer (including i/o of course). the post-state of one instruction is the pre-state of the next. this is a simplification of modern instruction pipelines but it is often a sufficient level of detail for reasoning.

so suppose you write down an instruction using the "static" view of a program. say:

1:  x <- k * v

if instead you take the first view, you may then feel the need to "annotate" this with the "formal" property that the post-state of the computer contains a variable whose arithmetic value is the product of two others:

// annotation: { post[1][x] = pre[1][k] * pre[1][v] }
1:  x <- k * v

it may strike you as suspicious at this point that you've just written down two nearly identical expressions, but recall that in this view the program is an "active" object (subject to wild fits of unpredictability) and the annotation spec, in some more "mathematical" semantics, is the only "formal" object.

what this means, when you say that an expression is "formal", is that the formal object is the only one you have confidence about your reasoning about, and transformation and of. for example, say you later have an instruction:

// annotation: { post[5][y] = pre[5][x] * pre[5][v] }
5:  y <- x / v

you can then perform the annotation reasoning:

post[5][y] = pre[5][x] * pre[5][v]
           = post[1][x] * pre[1][v]
           = pre[1][k]

and then you can (somehow) relate this back to the formal text of the program to see that it's legal to rewrite as:

5:  y <- k

but notice that to perform this reasoning, you had to perform a "round trip" in and out of the "formal" system. after all, you cannot reason about filthy active objects like programs; they might do anything! and of course in order to faithfully do the "translation" back and forth between the "active" and "formal" parts of the program, you need a formal relationship between your "formal" notation and the program text. oops! you've just accidentally raised the program text to the level of a "formal" system anyways!

this is the convoluted thinking (unfortunately common, in "formal methods" circles) that Hehner sets out to change. he thinks that the program is a formal expression of a relationship between pre- and post- states of a computer, and if you tidy up common programming languages you will find buried inside them a simple boolean expression logic which uses the programming language as its fundamental terms. otherwise you spend all your time translating back and forth between the program text and your "real" formalism, and you get so much clutter that any reasonable programmer gives up in disgust.

lest you think this is completely implausible, have a look at ACL2. it is a fully functioning, quite usable system which performs exactly this sort of "reasoning about the program text" for a fully executable lisp language.

97 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!