Older blog entries for jsm28 (starting at number 2)

I'm in the process of reorganising GCC's code for checking printf, scanf and strftime formats.

Originally the code used a flag to indicate whether a function was printf-like or scanf-like. Then more and more format features from X/Open, various extensions and C99 were added, strftime format support was added to help detect Y2K problems, and the amount of ad hoc conditionals in the code for particular cases steadily went up; though the details of what type is expected by each combination of a conversion specifier (e.g. d) and a length modifier (e.g. l) were stored in a table (in this example showing that printf %ld expects a long argument), as were details of the acceptable flags on each conversion, more and more other information went into ad hoc code, which was full of bugs, especially in the details of what GCC accepted with each standard version selected and -pedantic.

After reorganising and rewriting much of the code, it should now follow C99 correctly, except for lack of support for the j length modifier (intmax_t and uintmax_t), which needs work elsewhere in GCC and a solution to the question of whether GCC or glibc should be providing <stdint.h>. For example, with GCC 3.0 you will be able to use %zu formats for size_t values and so get rid of ugly casts to unsigned long that were previously needed for portable code (as long as you don't need to support older systems with pre-C99 libraries that don't understand %zu; glibc 2.1 is OK though there are a few C99 format features it lacks, and glibc 2.2 will have an almost complete implementation of the whole C99 standard library). Details of the standards versions in which features appeared are now stored in data; all X/Open extensions and most glibc extensions are supported. The next stage is to move details of the flags supported by each format type into data as well; the eventual target is to allow a program to specify the format rules for its own printf-like functions. (For example, a while ago on linux-kernel it was suggested to add format extensions to format IP addresses with printk, but GCC couldn't check such extensions. FreeBSD, with much closer control over the GCC used, has such extensions and modifies GCC to add hardcoded support for them.)

C99 adds "type-generic macros" in <tgmath.h>. They feel much more like C++ than in the spirit of C, but since they're in the standard glibc needs to implement them. Implementing them needs compiler extensions; if you look at the one glibc 2.1 installs you'll see macros using __typeof__ and statement expressions that sort-of work and sort-of make sense. However, they get the behavior for integer arguments completely wrong.

The other day I studied the matter and decided that a fully correct implementation was impossible with the existing GCC features and proposed some new compiler builtins to allow for a clean implementation. After a few mailing list messages on how things might be achieved with other extensions such as __builtin_classify_type, I saw how it could in fact be implemented. The macros I produced work, but are more obscure than even glibc's usual standard. Now Ulrich Drepper has been crazy enough to include them in glibc.

2000-08-01  Ulrich Drepper  <drepper@redhat.com>
            Joseph S. Myers  <jsm28@cam.ac.uk>

* math/tgmath.h: Make standard compliant. Don't ask how.

/* This is ugly but unless gcc gets appropriate builtins we have to do something like this. Don't ask how it works. */

/* 1 if 'type' is a floating type, 0 if 'type' is an integer type. Allows for _Bool. Expands to an integer constant expression. */ #define __floating_type(type) (((type) 0.25) && ((type) 0.25 - 1))

/* The tgmath real type for T, where E is 0 if T is an integer type and 1 for a floating type. */ #define __tgmath_real_type_sub(T, E) \ __typeof__(*(0 ? (__typeof__(0 ? (double *)0 : (void *)(E)))0 \ : (__typeof__(0 ? (T *)0 : (void *)(!(E))))0))

/* The tgmath real type of EXPR. */ #define __tgmath_real_type(expr) \ __tgmath_real_type_sub(__typeof__(expr), __floating_type(__typeof__(expr)))

If __tgmath_read_type_sub makes sense to anyone without referring to some version or draft of the standard, I'd be surprised; the rules used for the type of conditional expressions are arcane. If you try including this header in C++ code, or try calling any of the macros with complex integer types (a GCC extension), you'll get what you deserve. If you try nesting calls to the type-generic macros, it should work - provided your machine can cope with the code expansion involved, similar to the problem of a harmless five nested calls to strcpy expanding to 200 Mbyte of text after preprocessing. If you actually want to use such an obscure <tgmath.h>, you trust too much in the compiler and in magic.

You are in a maze of twisty macros, headers, compiler extensions and expensive standards, all different.

My copy of the new C standard, ISO/IEC 9899:1999, finally arrived today, five months after I ordered it. This should (once I've spent a day or two carefully reading the whole standard) help in documenting the true status of C99 implementation in GCC and in creating test cases for C99 features and implementing or fixing some of them in GCC; up to now I've been working from a PDF of the FDIS plus the editor progress report on later changes. Done so far: many test cases, general clean up of C99-related documentation and parts of web pages, many fixes to exactly what features are allowed in which -std mode and to the -pedantic behaviour, many fixes to the printf format checking, the C99 names for long long limits in <limits.h>. Many other C99 features have already been implemented by other people.

I consider that the standard is worth buying and reading for every serious C programmer (though there will always be some people who make confident pronouncements about C on mailing lists and Usenet that bear no relation to its contents). Drafts and books other than the standard are best avoided for reference use. The Defect Reports are also useful to give an idea of problem areas in the standard. The PDF version announced on 18 July may suffice if you don't feel you want the "real thing" as printed and bound in Switzerland. The old standard in four parts is useful as well if you can obtain it.

The five month delay arose thus: I had book token prizes from Trinity College and wanted to buy the printed C99 standard using them; this meant going via a bookshop; ISO cause problems for bookshops by requiring payment in advance and not giving discounts. The bookshops I tried were unwilling or incompetent, even though authorised to mark up the price as necessary to cover their profit margins and costs: Heffers simply refused to handle ISO publications; Waterstone's took the order in February, but took four months after receiving a quote to get payment to ISO, during which time they more than once told me that payment had gone to ISO when it hadn't, failed to respond to emails or to email me when they had said they would, and generally didn't make any apparent progress until I got the manager of the branch involved (this is a criticism of their overall systems that failed to handle the order effectively, not of the individual staff involved).

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!