I have begun work on an implementation of the W3C DOM Level
2 Core and Events. The implementation language is C. There
are existing DOM implementations in C, not least DOMC and
GDOME. These packages don't implement my style of C API, so
I decided I needed my own. I absolutely insist on
consistent return values from functions, naming conventions,
etc.
So far I have some relatively good ideas for the nuts and
bolts. It should be easy to use a shared string table to
reduce memory usage on long documents, and I have planned
ahead of time how to implement live NodeList objects (which
neither existing implementation implements).
The major stumbling block so far has been deciding what C
datatype to use for DOMString, which is a sting of unsigned
16-bit values. wchar is a joke: just like all other C
datatypes (save char), you never have any idea of its actual
width. The C library functions for dealing with wchar are
horrible. I have dodged this problem by skipping wchar
altogether and just using auto* tools to detect and define a
16-bit-wide system datatype.
As usual, C's definition is a hindrance. There ought to
only be signed and unsigned 8, 16, 32, and 64-bit integer
datatypes in C, and they ought to be explicity defined that way.