Advogato: Blog for nconway

Robert: Most of the issues the Coverity tool found were actually in pg_dump and ecpg; it actually only found a single bug in the Postgres backend, which I'm somewhat embarrassed to admit I introduced during the 8.0 development cycle. Which is both impressive and suspicious -- a single bug in 250,000-odd lines of C is too good to be believed. As it turns out, there is a good reason for this: the Stanford checker was already run against the Postgres tree a few year ago (in the 7.3 timeframe, IIRC), and the bugs it identified were fixed. Since the Coverity tool is an improved, commercialized version of the original Stanford checker, it's not too surprising that the tool didn't find very many new bugs in the backend.

Another factor is that the way that memory management is done in Postgres makes static analysis a little more difficult. For example, given the code:

void foo(char *arg)
{
    char *ptr = strdup(arg);
    if (!ptr)
        return;
    
    printf("Hello, world: %s\n", ptr);
}

it is relatively easy to determine via static analysis that the strdup leaks memory when ptr goes out of scope. However, malloc and friends are rarely used directly in the backend; we use region-based memory allocation instead. In other words, each allocation is done in a "memory context"; an allocation can be manually released via pfree, or all the allocations in a context can be released at once (via MemoryContextReset and related functions). Contexts are arranged into a hierarchy -- when a context is reset, all of its child contexts are also reset. This significantly simplifies memory management and error recovery: when we're done processing a query, we can just reset the memory context associated with that query, and we can be pretty confident all associated memory will be released. If a transaction is aborted, we can longjmp back to the abort error handler and then reset the transaction's context -- this cleans up all memory allocated during the transaction.

On the other hand, this technique makes static analysis of memory leaks a little more difficult:

void foo(char *arg)
{
    char *ptr = pstrdup(arg);
    printf("Hello, world: %s\n", ptr);
}

pstrdup allocates memory in the CurrentMemoryContext; when that context is reset or deleted, the memory will be released. Some variant of the above code could be perfectly valid: if the code knows that CurrentMemoryContext will always be a short-lived memory context (i.e. one that will be reset "soon"), it needn't bother explicitely freeing all its allocations. This improves readability (by not cluttering the code with pointless frees) as well as performance (it is faster to reset an entire context at once than to release each individual allocation).

So in some sense, memory leaks are impossible: allocated memory will never be unrecoverably lost until process exit. However, you can still have problems -- this bug in 8.0 is a typical example of the kinds of memory leaks you can get with region-based allocation. This code is rewriting a table, doing essentially:

for each tuple in the old table
    evaluate any new default expressions
    recheck any applicable constraints
    insert the tuple in the new table

We need to do this when adding a column with a non-NULL default value to a table, for example. The problem was that when evaluating default expressions, we invoked various functions that performed allocations in the CurrentMemoryContext. These functions effectively assumed that CurrentMemoryContext was short-lived -- but in this case, we didn't reset the context for each iteration of the loop. Again, there is no "memory leak" here, strictly speaking -- the CurrentMemoryContext would eventually be reset. But if the loop iterated a few hundred million times on a large table, we would quickly exhaust all available memory.

So how do you detect these sorts of pseudo-leaks via static analysis? I don't really see an effective way, although it's an interesting problem.

Anyway, my point is that using metrics like "tool X found Y bugs per Z lines of code" as an indication of software quality is pretty dubious -- and it's even more dubious to try to use this metric to compare the quality of two different software products.

20 Jul 2005 nconway » (Master)