23 Aug 2004 apenwarr   » (Master)

IDL Compilers

Perhaps foolishly, I told pphaneuf that I might help him with his XPLC IDL compiler if he would send me some "before and after" examples of what he wanted it to look like. He sent me some pretty crappy examples written by the XPCOM people (WOW! Bad documentation!), but I got motivated anyway, probably due to my still-sucky-but-upcoming manifesto, and like magic... we now have an IDL compiler.

Before, I thought that IDL ("Interface Definition Language") was a general term for languages that define interfaces, and an IDL compiler was a general term for something that turns your general language into a real language (like C++, perl, etc). But it turns out, for those who didn't know, that IDL is itself a well-defined language (by the CORBA people), and it's actually both straightforward and well done. The rest of CORBA turned into a mess, but IDL is a nice little gem hidden inside. Plus, there's a libIDL that even turns it into a parse tree for me.

Anyway, once I got the parser working (using the crazy magic of WvCont to convert the ugly state machine into a simple procedural program, heh heh), producing C++ output was pretty easy - IDL is based on C++ anyway, it seems. Then it wasn't too hard to write C bindings by making very, very dangerous evil assumptions about the format of the C++ vtable (so shoot me). Result: now, like magic, I can access my C++ classes transparently from C with zero overhead compared to C++. It's neat. (The overhead in C++ is also minimal; just the cost of a virtual function call. XPLC doesn't get in the way of plain C++ programming here.)

But that's all normal stuff that people have been doing (badly) for years. I could write a whole rant about how ridiculously bad COM and XPCOM are for no apparent reason, but I won't.

Instead, I will reveal to you the amazing secret I discovered about IDL: it thinks like I do. That's right, while in UniConf I made the simplifying assumption that "everything is a string" and it made everything easy, IDL allows us to make almost the same assumption. Everything is a string or an object. And, as we know, monikers are strings that map into objects.

So I added a feature to my IDL compiler to write wrapper functions for your interfaces. The wrappers *only* pass around "strings" and "objects" (actually, they pass around union objects that are both). You can look up a function by name, then match against its signature to see if the objects you have are the ones it wants, just in case there's more than one function with that name (ie. in different types of interface).

The magic: objects have interfaces, and functions take a "this" pointer that's always an object; but when it comes down to it, that's all objects do. You don't poke around inside them. You don't "serialize" them. They're opaque. You simply try to run functions on them and check the return values to see if they work.

CORBA (and others) screwed up by assuming I wanted to "do stuff" with objects, like ship them around transparently from computer to computer. Big mistake: real life says that I ship strings around from computer to computer, not objects, and I have nothing against making that transparent. And hey, if my objects *want* to have functions to convert themselves to strings and back, that's fine with me - but it's not transparent. It's for them to worry about.

Anyway, it turns out that this model is pretty trivial to get working, and functions that take and return strings and/or pointers to objects happen to be really well supported by SWIG, so I can now access my C++ objects from tcl and (probably) perl and a bunch of other languages, essentially for free. (I only need one plugin per language, not one per language+interface, and SWIG will write most of the plugins for me!)

What amazes me is that it's always been done so badly before. XPCOM claims to do this, but after all this time, I have no idea how to load XPCOM components into tcl or run a command-line javascript interpreter. Why? It should be easy. In XPLC, it'll be easy. (By the way, the XPLC runtime is still only about 26k. Now that we have scripting, there will be an extra cost of a few kbytes for each language binding we want. For reference, the tcl bindings are about 10k.)

I really want to write a "sh" binding to demonstrate how powerful this model is. After all, the shell is great at dealing with strings, and since I don't ship my objects back and forth, I can just have them all live in another process that my shell can talk to. Nobody else's component model has *shell* plugins.

Anybody who wants to follow along can grab my very-alpha-quality code from open.nit.ca's anonymous CVS (project 'xplcidl').

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!