I have begun work on an implementation of the W3C DOM Level 2 Core and Events. The implementation language is C. There are existing DOM implementations in C, not least DOMC and GDOME. These packages don't implement my style of C API, so I decided I needed my own. I absolutely insist on consistent return values from functions, naming conventions, etc.
So far I have some relatively good ideas for the nuts and bolts. It should be easy to use a shared string table to reduce memory usage on long documents, and I have planned ahead of time how to implement live NodeList objects (which neither existing implementation implements).
The major stumbling block so far has been deciding what C datatype to use for DOMString, which is a sting of unsigned 16-bit values. wchar is a joke: just like all other C datatypes (save char), you never have any idea of its actual width. The C library functions for dealing with wchar are horrible. I have dodged this problem by skipping wchar altogether and just using auto* tools to detect and define a 16-bit-wide system datatype.
As usual, C's definition is a hindrance. There ought to only be signed and unsigned 8, 16, 32, and 64-bit integer datatypes in C, and they ought to be explicity defined that way.