Older blog entries for apenwarr (starting at number 41)

Conspiracy Theory Updates

So I'm emerging from my ExchangeIt project acceleration marathon, which of course involved plenty of Windows programming. I won't go into the gory details of that, because the details are not that important. (The answer to my earlier question about _read(), by the way, is to special-case every kind of stdin, such as named pipes, console objects, and files. Like I said - gory.)

My conspiracy theory comes in three parts:

1. Architectural Philosophy

I think my next NonDirectionalFridays presentation will be about what I call "Architectural Philosophy" - how your way of looking at things influences the way you design software. In particular, it's interesting to compare Windows and Unix architecture (since I recently became intimately familiar with the former), where they came from, and the results.

The short version is that Windows was built, from day one, on pragmatism. There is no sense of art, no simplifying assumption that binds it all together. On the other hand, if you want to do something, it's for sure that you can (assuming you put in enough effort) - and better still, Microsoft will jump through hoops until the end of time to make sure that your code keeps on working forever. That's the only reason DOS programs still run in Windows XP.

Unix was built, at the core, on a sense of elegance. There are a small few simplifying assumptions in its design that makes many things easy to do. It is also, not quite by coincidence, very pragmatic - my old Unix programs can still work today, too (if I can compile from source), and this has little to do with art. It has to do with keeping the API relatively constant from day one. Just like Windows did. The only difference is that the Unix API made more sense on day one.

But the Unix philosophy, nowadays, is more about the elegance than the pragmatism. WvStreams is my personal example of this. The API is nice, and it lets you write simple programs, but I also change it relatively often; my simplifying assumptions weren't always right.

GTK, on the other hand, is really based on the Windows philosophy. The ability to write code simply and expressively doesn't figure into GTK at all; but the desire to make sure a program you wrote last year still runs ten years from now is hardcoded into the design.

What's interesting is that you usually have to choose between pragmatism and elegance; Unix is a rare exception in which, by sheer genius of the designers, you could have something both useful and expressive at the same time. But the original Unix design didn't include a GUI, and there are no design geniuses for the GUI like there were for the kernel. That, or their designs were never popular.

Lacking the ability to have elegance and pragmatism at once, you have to choose one or the other. Customers will choose the working solution, thank you very much. That's why GTK exists and is popular; it's also why GTK is no better than Windows.

2. Microsoft is a Gluer

I've written before about the role of Gluers in coding, and how their time has come. Microsoft is actually the biggest gluer of all; by forcing a pragmatic architecture down our throats, they force Windows applications to (more or less) interoperate. I wish the pile of crap this produces would be a bit more elegant sometimes, but at least it works. That's why customers buy it.

Unix people (like me) keep trying to build beautiful designs, hoping that if the design is sufficiently beautiful, it will defeat Windows and Windows applications. They point to the elegant design of Unix as an example of how this might be possible. But this won't happen. Your system must be pragmatic first, and pretty design is a bonus. People have missed the fact that Unix is not only elegant: it's also highly functional.

3. Microsoft Needs Competition, But It's Okay if They Win

One thing I'm becoming more and more sure of as time goes on is that Microsoft is one of the better monopolists I could have to deal with. I mean, the phone companies? Kill them all. Cable companies? Garbage. Oil? Well.

Microsoft, for all their faults, still has this misguided notion that the way to maintain their monopoly is to produce better products than everybody else. (For some definition of "better". Let's not go into that.)

But here's the catch: they stop at "better than everybody else." And if someone produces a slightly better product but drops dead too soon, Microsoft won't bother to steal their ideas. So you have to compete with Microsoft until they steal your ideas. Then your ideas become mainstream, and once Microsoft supports them, they'll never really die. So you win, even if you lost. (Example: IE is still the best web browser, darn it. And yes, I've tried Firefox and seen Opera.)

Conclusion

If in doubt, bet on Windows. But if you want something to show up in Windows, build it in Linux first, then make sure you don't die until they clone your stuff. It doesn't have to be pretty - it just has to work.

Of course, all of this is probably little help to you if you're actually trying to win the war...

12 Oct 2004 (updated 12 Oct 2004 at 06:03 UTC) »
Soul Sucking

This weekend I've been hacking the win32 port of WvStreams to work more smoothly. It already works, but the unit testing framework didn't compile, so I had my suspicions that it didn't all work. Sure enough, I fixed the few minor bugs in the unit testing framework, and next thing you know - failing unit tests everywhere.

So first of all, yay for unit tests. Automated magic bug pinpointers, they are. A busy weekend of hacking got the failure count down to zero where it should be, and sure enough - all the WvStreams demo apps work way better now.

Anyway, since I would hardly want to resort to using Visual C++, and since the unit tests are kind of most convenient when used with GNU Make, I thought: let's see how much of this I can do on my Linux system.

Ha. So I now know all about mingw32 (a wonderful system). I also know more than I'd like to know about msvcrt.dll's fd-to-handle mapping (a terrible system). And it all runs correctly under WINE (a truly horrendous pile of puke).

Oh well, if nothing else, I guess I can extend my UniConf demo. Now I can run a UniConfDaemon under wine, exporting the Windows registry over TCP so I can connect to it from a Linux box, replicate it into my Maildir, point gconfd at that, and then configure KDE through gconf. If all that put together doesn't crash, I guess something doesn't suck.

Related Question

If you're doing _read(0,buf,sizeof(buf)) in a thread in win32, does anybody know how, from another thread, to get _read() to back out gently and return? I wouldn't even mind closing fd#0 in order to do this, but stunningly, close() blocks forever because _read() is blocking while in a critical section. Argh. I'm sure there has to be a way to do this.

I used to think the Worse is Better guy was just making an idle example. But good old EINTR - holy cow. How I miss you!

Irony

Tomorrow, I'm giving a presentation on how to write/debug code more quickly. And I spent the weekend working around Windows quirks. Sigh.

Monopolize this!

If we end up extending XPLC's IDL compiler to support Java and .NET, I'm going to call the result .NIT (dot Net Integration Technologies). Just try and stop me.

Language Reliability, Round 2

Several people accused my last post of being a thinly-veiled troll. Well... yes. But I was also serious.

Thanks to davidw, who reminded me that Erlang does in fact both exist and run high-performance, high-reliability systems. I haven't looked at it very closely, though; I suspect it may have other overriding disadvantages, possibly including the most common one: incompatibility with pre-existing code. "100% Pure Java" my bum.

To the accusations that I was unclear in my requirements for "reliability" and "high-performance", I guess I can clarify a bit. My company makes highly reliable backup software. Imagine your job was to manage the mission-critical backups for a 1000-person company, and in case of any failure, you had to be able to get back up and running again within 24 hours. First of all, can a program written in one of these scripting languages deal with the massive quantities of data involved - hundreds of gigabytes, millions of files? Would your backup even finish every night? Can you trust your backups to be reliable?

This is an honest question, and when I ask myself, the answer is no. I've just never seen a program written in one of these languages that I would trust with my mission-critical data. So I ask you the same: have you seen programs in these non-C/C++ languages that are actually this fast and reliable? I haven't. I would love to hear some examples.

Scripting Languages and Reliability

Since I've said this to several people at work in the last few days and nobody has had the heart to disagree with me, I will now open myself up to public flaming and see if that helps.

My assertion: it is totally impossible make really reliable, high-performance programs (like oh, say, idb backup) in a language other than C/C++ (and maybe ADA and C#, but I'll reserve judgement for now). This includes, but is certainly not limited to: python, perl, shell, java, tcl, and php. Especially python. And *really* especially PHP.

My evidence: nobody has ever written a reliable, high-performance program in *any* of those languages.

Thank you. Have a nice day.

Management Architecture

To avoid the worried feeling I get about a recent directive from top management to effect of "Do everything possible to trick Avery out of writing actual code, because he's most useful doing other things," I've been trying to make my own job seem more attractive. (That's the actual quote, by the way. There is no directive to me, only to other people. I love this company :))

So here's my latest attempt: in a parallel with the similarities I noticed between programming and People Hacking, I've noticed a similarity between code architecture and company management structure. My latest theory - and the coders at NITI will be familiar with the results of this by now - is that a manager, like an object, should do just one thing, and do it well. If it takes you more than a sentence to clearly explain what your manager does, then his job is not clearly defined, and sure enough: he'll do it badly.

So, after the OO fashion of the moment, we've been trying to rearrange our management structure on a functional-block basis instead of using a pure inheritence hierarchy. For example, the new Pusher job (no link yet, sorry) is "responsible for when a release comes out." The now-simplified Visionary job (well, that link is for EvilVisionary, but you get the idea) is "responsible for what goes into a release." Best of all, you instantiate a new Pusher for each release, but they all communicate with the singleton Visionary object, er, person, who can move features around between releases.

Consider what the pusher does when your release is late:

  bool ok = visionary.drop_features(release, 5 /* days */);
  if (ok)
     return true; // on time again!
  else
     release.duedate += 5; /* days slipped */

I think pphaneuf will agree with me when I say that there needs to be one object at the top that ties it all together by setting up the initial objects, but it should probably be about ten lines long and do almost nothing.

Okay, I'll stop now.

IDL Compilers

Perhaps foolishly, I told pphaneuf that I might help him with his XPLC IDL compiler if he would send me some "before and after" examples of what he wanted it to look like. He sent me some pretty crappy examples written by the XPCOM people (WOW! Bad documentation!), but I got motivated anyway, probably due to my still-sucky-but-upcoming manifesto, and like magic... we now have an IDL compiler.

Before, I thought that IDL ("Interface Definition Language") was a general term for languages that define interfaces, and an IDL compiler was a general term for something that turns your general language into a real language (like C++, perl, etc). But it turns out, for those who didn't know, that IDL is itself a well-defined language (by the CORBA people), and it's actually both straightforward and well done. The rest of CORBA turned into a mess, but IDL is a nice little gem hidden inside. Plus, there's a libIDL that even turns it into a parse tree for me.

Anyway, once I got the parser working (using the crazy magic of WvCont to convert the ugly state machine into a simple procedural program, heh heh), producing C++ output was pretty easy - IDL is based on C++ anyway, it seems. Then it wasn't too hard to write C bindings by making very, very dangerous evil assumptions about the format of the C++ vtable (so shoot me). Result: now, like magic, I can access my C++ classes transparently from C with zero overhead compared to C++. It's neat. (The overhead in C++ is also minimal; just the cost of a virtual function call. XPLC doesn't get in the way of plain C++ programming here.)

But that's all normal stuff that people have been doing (badly) for years. I could write a whole rant about how ridiculously bad COM and XPCOM are for no apparent reason, but I won't.

Instead, I will reveal to you the amazing secret I discovered about IDL: it thinks like I do. That's right, while in UniConf I made the simplifying assumption that "everything is a string" and it made everything easy, IDL allows us to make almost the same assumption. Everything is a string or an object. And, as we know, monikers are strings that map into objects.

So I added a feature to my IDL compiler to write wrapper functions for your interfaces. The wrappers *only* pass around "strings" and "objects" (actually, they pass around union objects that are both). You can look up a function by name, then match against its signature to see if the objects you have are the ones it wants, just in case there's more than one function with that name (ie. in different types of interface).

The magic: objects have interfaces, and functions take a "this" pointer that's always an object; but when it comes down to it, that's all objects do. You don't poke around inside them. You don't "serialize" them. They're opaque. You simply try to run functions on them and check the return values to see if they work.

CORBA (and others) screwed up by assuming I wanted to "do stuff" with objects, like ship them around transparently from computer to computer. Big mistake: real life says that I ship strings around from computer to computer, not objects, and I have nothing against making that transparent. And hey, if my objects *want* to have functions to convert themselves to strings and back, that's fine with me - but it's not transparent. It's for them to worry about.

Anyway, it turns out that this model is pretty trivial to get working, and functions that take and return strings and/or pointers to objects happen to be really well supported by SWIG, so I can now access my C++ objects from tcl and (probably) perl and a bunch of other languages, essentially for free. (I only need one plugin per language, not one per language+interface, and SWIG will write most of the plugins for me!)

What amazes me is that it's always been done so badly before. XPCOM claims to do this, but after all this time, I have no idea how to load XPCOM components into tcl or run a command-line javascript interpreter. Why? It should be easy. In XPLC, it'll be easy. (By the way, the XPLC runtime is still only about 26k. Now that we have scripting, there will be an extra cost of a few kbytes for each language binding we want. For reference, the tcl bindings are about 10k.)

I really want to write a "sh" binding to demonstrate how powerful this model is. After all, the shell is great at dealing with strings, and since I don't ship my objects back and forth, I can just have them all live in another process that my shell can talk to. Nobody else's component model has *shell* plugins.

Anybody who wants to follow along can grab my very-alpha-quality code from open.nit.ca's anonymous CVS (project 'xplcidl').

Manifesto Writing

At Ozzy's suggestion, watched Pirates of Silicon Valley tonight for inspiration. Slightly depressed that I am neither a slimy poker-faced business supergeek negotiator (Gates) nor a crazy slave-driving hippie artist (Jobs), and will therefore probably never amount to much. On the up side, probably nobody will make a movie featuring my pathetic roller-skating antics or illegitimate children. It could be worse.

Embed me some HTML, Please

I was sitting today, thinking how unfortunate it is that C/C++ don't have any string quoting operators other than double-quote. I mean, designers of all sorts of languages since then (perl, in the especially extreme case) have figured out that if you want to print double quotes, that's just no fun at all.

Then I realized I was wrong.

#define qq(s) #s

int main() { printf(qq(blah blah\n "quoted" stuff (and some parens) with newlines)); return 0; }

Try it. It works.

(Super)freeswan Super Sucks

...but the world is a better place thanks to ipsec_tunnel and patch that makes isakmpd work with it.

IPsec is a grotesque horror story which barely works and is too complicated to be provably secure. That's bad enough, but when crazy bad-quality programmers get into the picture, you get total nastiness - that is, Freeswan.

On the other hand, ipsec_tunnel is downright straightforward, but only because it skips the key negotiation stuff, expecting you to do it yourself. And isakmpd is actually pretty wonky, but only because it has to do a truly startlingly huge amount of complicated negotiation just to make things work out. I'm pretty sure they made it as configurable as they did just so you have to suffer a little bit, just like they did. But other than the big long boilerplate config file (containing words like "QM-ESP-3DES-SHA-PFS-SUITE"), the programmers are pretty certifiably Not Insane.

And after spending more hours today fighting with bugs in freeswan's pluto daemon, I could definitely use some of that.

Meanwhile, we're thinking of taking my age-old Tunnel Vision and making it use IPsec (ie. ipsec_tunnel, in this case) as the packet-transfer layer. That could have the major advantages of throwing out stupid horrible IKE, plus it would let you auto-negotiate routes like Tunnel Vision always does and IPsec never did. The ESP (tunnel) part of the IPsec standard isn't so bad; it's the key negotiation that sucks, so why not let SSL do it for me? Of course, it wouldn't really be IPsec-compatible then.

The other choice is to keep IKE and add a layer on top of that. The advantage there is that you can gracefully fall back to plain IPsec if the other guy doesn't have Tunnel Vision. But that solution makes me feel guilty, because then I'm just making a bad thing even worse. Oh well...

32 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!