Older blog entries for apenwarr (starting at number 565)

bup backups are 2.7 times smaller than rsnapshot

Zoran has been doing some interesting tests of bup space efficiency and import performance vs. good old rsnapshot.

Lately, he's been working on an rsnapshot-to-bup importer tool. According to his most recent message, after importing the incremental backups from two servers, the bup repo was 4.6GB vs. the original rsnapshot disk usage of 12.6GB. That's 2.7 times smaller.

YMMV. Results may not be typical. Always consult a physician.

And Also...

As of last night, you can also restore your bup backups. Yeah, I know. It's that awesome.

Syndicated 2010-09-08 20:30:56 from apenwarr - Business is Programming

6 Sep 2010 (updated 17 Oct 2010 at 03:06 UTC) »

Crystal ball

    I now have had my foggy crystal ball for quite a long time. Its predictions are invariably gloomy and usually correct, but I am quite used to that and they won't keep me from giving you a few suggestions, even if it is merely an exercise in futility whose only effect is to make you feel guilty.

    -- E.W. Dijkstra

Syndicated 2010-09-06 22:11:10 (Updated 2010-10-17 03:06:05) from apenwarr - Business is Programming

The sad evolution of wikis

I set up my first wiki in 2001, in preparation for hiring our very first software developer employee. Until then, software development at NITI had been handled entirely by me and the other technical co-founder, dcoombs. And since we knew everything already, documentation was unnecessary. When it came to hire a new person, documentation suddenly became very necessary. I had to do a brain dump, and a wiki seemed like a great solution. The original NitWiki was born.

The first NitWiki ran on zwiki. Then we migrated to WikkiTikkiTavi, which we hacked up resulting in what we called HackyTavi, which we then cleaned up and renamed to GracefulTavi. (Note: the original WikkiTikkiTavi is very nice and rather graceful. It was just HackyTavi that wasn't graceful. No offense intended.)

GracefulTavi performed wonderfully; at NITI's peak, we had more than 30 internal developers, plus all our technical support staff, using it daily. All our technical documentation was in there, and it was easy to read, easy to find, easy to link to, and (perhaps most importantly) easy to write. There was also tons of non-technical stuff; all sorts of company culture, office games, product plans, newsletters, and everything else got dumped into the wiki. The wiki was the default place to dump stuff, and it was all full-text-indexable and made it easy for anyone to provide feedback and fix typos.

It's the most fun I've ever had doing documentation. Even more fun than GourD, which was fun too in its own way. But that's another story.

GracefulTavi had a long and happy life; it was still in use in late 2006 when I left, and I suspect it's still alive at IBM (which acquired NITI) even today, at least as a read-only reference material.

As great as it was, however, I don't think a NitWiki would be easy to make successful today. At least, I've seen people try, and it doesn't really work out. One of the problems is Wikipedia Mentality.

Wikipedia changed a few fundamental ideas about wikis that had been taken for granted:

  • Wikipedia articles are supposed to look finished and professional.
  • Thus the concept of "ruining" a Wikipedia article actually exists, where it's much fuzzier in normal wikis (modulo spammers of course).
  • WikiWords are replaced with full-English phrases as hyperlinks.
  • Articles are separated from Discussion pages.
  • A mandatory "neutral point of view" (NPOV) replaces freedom and discussion.
  • Some articles/topics are considered "not appropriate" for wikipedia and get deleted.
These differences weren't driven home to me until a recent discussion on the git mailing list about what sorts of pages are appropriate or not and who should or should not have admin access to those pages.

"Oh," I thought to myself. "So that's why wikis aren't wikis anymore."

The Wikipedia changes make sense; absolutely. They're great changes. Without them, Wikipedia almost certainly would have failed. But if you make those changes, you don't end up with a NitWiki, you end up with a Wikipedia.

Wikipedia-style wikis are written primarily for outsiders. There's a community, but it doesn't come through in the articles; if you want the community, you have to go elsewhere, to the User or Discussion pages. And of course, the User and Discussion pages aren't part of the product; they're the necessary junk produced as a side effect of producing the product. In Wikipedia, the community is overhead.

Same with the Git wiki. It's treated as a product - a set of documentation. Anything that's not "an information source about git" is unwelcome. Which is fine, I guess, and git already has an active mailing list where the community hangs out (as you can see from the vigorous discussion inside that thread I linked to), so there's no reason to have a community in the wiki. Having two community hangouts might actually be detrimental.

But I miss having a community wiki. The very original wiki, at c2.com Wiki, is such a wiki, and it's a beautiful thing. I was never a member of that wiki, but it's a fascinating place to lurk nonetheless. And our interns at NITI used to call NitWiki the "electric babysitter."

The c2 wikiers also apparently have their own version of this rant that I'm writing and you're currently reading. Theirs is called WikiIsNotWikipedia.

So yeah, a "real community wiki" is nothing like Wikipedia. That's not so hard. We can just start a new un-wikipedia-like wiki and just do it the "old way." It wasn't very hard to teach people to use a wiki who had never used a wiki and who were used to writing Microsoft Word documents for everything. Just stay away from MediaWiki (which strongly encourages the Wikipedia way of doing things) and you'll be fine.

Unfortunately, outside of wikis, the world itself has also been busy evolving, and in a way I don't know how to deal with. The new problem is: teams are the wrong size now. Thirty dedicated, full-time programmers was an ideal number for NitWiki. But who has 30 dedicated, full-time programmers now? I mean *really* dedicated? You can be a very successful Internet startup with way less than 30 developers. Maybe you need only two. Maybe those two aren't even working full time on the one project. And two people don't need a wiki.

That lack of dedication - lack of single-minded attention of one team to one goal - is a serious problem in building a project-oriented wiki. A lot of developers at NITI *lived* in the wiki; that was where they went when they were bored (instead of to a news site). It was where they went to ask a question, or answer a question, or design a new module, or plan a party. And as the wiki expanded, more and more WikiWords that you invented - planning to fill it in next - turned out to already exist, and to already have the content you meant to write. The WikiWord effect is kind of magical that way. It's really fun and rewarding when it happens.

But when there are too many wikis - more than one is too many - it stops happening. Instead, you get the opposite effect, and it's frustrating: you know you wrote a WikiWord page that describes a particular concept... but that page isn't in the current wiki, it's in some other wiki. So you use the WikiWord, but it comes up as missing. Argh. That defeats the whole point. You might as well be writing emails or Word documents or Wikipedia articles.

We even had this problem at NITI, and it was a bad one. In fact, we had two wikis: the internal-only NitWiki, and the public now-defunct OpenNit (open.nit.ca) wiki. In theory, we were supposed to put all our public-facing stuff in OpenNit, and all the private stuff in NitWiki. But it didn't work because you never knew quite what would be "private" until it happened. What if you're discussing something in public, but you want to add a related comment about some private project we're working on that we haven't announced yet? It kills the discussion. Or more likely, it results in all discussions just being done on the internal-only wiki, which is mostly what happened.

Internally, we had some really great documentation. But the documentation of our open source stuff was mostly garbage, because it was mostly on the internal-only site and nobody internally visited the public one.

We never solved this problem. I wish we had; our NitWiki was fun enough that I would have liked to share it with more people. Of course, people would have been more afraid to post if they had to do it in front of everyone on the Internet, so making it public never would have worked.

How do you create a vibrant community, but allow for private topics and discussion, but allow for public topics and discussion, and allow me to work for more than one company at a time with multiple private discussions, and have my WikiWords always end up pointing where they're supposed to?

I have no idea.

Dear Internet: please invent the solution to that for me. Thanks.

P.S. You don't get any bonus points for saying, "The answer is Google; just blog about it, don't worry about hyperlinks, and let Google full-text index the rest." That answer is funny, but it doesn't work for my private stuff.

Syndicated 2010-09-01 23:25:35 from apenwarr - Business is Programming

26 Aug 2010 (updated 26 Aug 2010 at 20:09 UTC) »

The strange case of virtual machines on telephones

    "Look, theory says that a JIT can run as fast as, or maybe faster than, a statically compiled language. It might be slow right now, but it'll be much better when we get a real/better JIT. Plus, the new version is already a lot faster, and I'm looking forward to the next version, which they promise will have huge speed improvements."

    -- Every Java user since 1996

If you've been saying the above about your Android phone (or Blackberry), then you, too, have become part of the decade-and-a-half-long train wreck of computer science that is Java.

I'm often mystified at the rejection of reality displayed by the proponents of Java-like virtual machines. It seems a simple statement of fact: even after 14 years, Java is still much slower than native code, and you can see it clearly just by looking at any app for 10 seconds. And yet the excuses above keep coming. 14 years.

But then I think, I know how this delusion works. I've been guilty of it myself. At my first company, I pushed to have all our data interchange sent through an API that I designed - UniConf - which was unfortunately slower in almost all cases than not using it. The idea was that if only all our code could be 100% pure UniConf then we'd suddenly be able to realize tons of wonderful advantages.

But despite herculean efforts, the advantages never materialized. What materialized was a lot of slowness, a lot of excessive memory usage, and a lot of weird bugs that forced us to backtrack through seven layers of overly-generalized code to diagnose.

Luckily for me, lack of resources prevented my own madness from spreading too far. I'm much better now.1

But what would it be like if the madness had been successful? What if I had been responsible for a system that spread to millions of users worldwide, which in nearly every case made things visibly and obviously worse? What would that do to my psyche? I think it would be unbearable.

Which brings us to Java-like VMs on cell phones. I have a lot of sympathy here, because:

Java used to be a good idea. Really.

Java on cell phones has not always been obviously a bad idea. To see why, you have to understand a bit about how these systems evolved.

First of all, we have little visibility into the Java's original reason for being. We know what people said, but we don't know if they said that for marketing or retroactive justification. What we do know is that the original sales push behind Java was applets for your web browser. Rich, client-side web applications.

Client-side web applications have exactly one super difficult critical requirement: security. You're downloading random apps from the Internet automatically and you want to run them automatically, and some of these apps will definitely be written by evil people and try to screw you, so you need a defense mechanism. Moreover, most people doing this will be doing it on Windows, which at the time meant Windows 95, which had no actual security whatsoever. Any native code could do anything it wanted. This situation persisted, mostly, up to and including Windows XP. (NT-based kernels have security, but the average person just ran everything as an administrator, negating literally all of it.)

So the typical user's operating system provided no strict memory protection or any other security features. This is where Java made perfect sense: if you can provably enforce security at the application layer, you can make a virtual machine that actually includes these missing security features, thus making it safe to run random applications on the Internet, and propelling us into the Internet Age. Sweet.

Java happened to fail at that, mostly due to slowness and crappiness and licensing, but the idea was sound, and it was a valiant and worthwhile effort that deserves our respect even if it didn't work out. Flash and Javascript won out in the end because they were somewhat better in some ways, but they both use VMs (whether interpreted or JITed), and rightly so.2

Unfortunately, nowadays the vast majority of Java apps never use any of Java's security features; they run as apps with full user rights, either on the client or on the server. So that advantage of the VM is gone... and the Java VM has no other advantages.3 But people, having been fooled once, kept going on the path they were already on.

Now ironically, the real problem was not natively compiled languages, but Windows (or to be generous to Microsoft, "the operating systems at the time"). Anybody who has studied computer science knows that modern processors capable of virtual memory were designed around the idea of keeping untrusted apps under control. Once upon a time, people used to actually share time on Unix machines. Lots of people on a few machines. And they were largely prevented from stomping on each other. The exceptions were security holes - fixable mistakes - and VMs have those too.

It is really not that hard to lock an application into a protected environment when your processor includes security features. Just google for chroot, BSD jail, AppArmor, SELinux. Yes, some of them are a little complex, but security is complex; nobody ever claimed Java's security architecture was simple.

Of course, if I had said that five years ago, you might not have believed me; you might have said those systems weren't secure enough, and that Java was somehow more secure in ways you couldn't quantify, but that application-level VM security is just somehow better somehow, I mean look at the virus situation on Windows. And I wouldn't be able to argue with you, because that's not even a logical argument, but it sounds vaguely convincing. And so the world went.

Then Apple came along and made the iPhone and its App Store and all the apps are native and the thing is still secure and apps can't stomp all over the system. Again, modulo security holes - fixable mistakes - which VMs don't eliminate. Here everybody was, going along with the above illogical argument in favour of VM security because they couldn't argue with it, and Apple just ignored them and showed it was all wrong. You can make native code secure. Of course you can. People did it in the 1980's. What on earth were we thinking?

But I'm getting ahead of the story a bit. Now I've told you why Android's use of a Java-like VM was demonstrably wrong (Apple demonstrated it) from the beginning, but first I wanted to tell you why Blackberries use Java, and lots of old cell phones used Java, and that wasn't obviously wrong.

The reason, of course, is that when Java was first applied to mobile phones, mobile phones didn't have processors capable of protected memory. Those processors were really low powered; security was impossible. Before Java, you could write custom native apps for a Blackberry... as long as you gave your source code to RIM to have them review it. Because native code could do anything, and there was physically no way to stop it once it got onto the device. Other phone manufacturers didn't even bother.

At the time, the first inexpensive embedded processors supporting protected memory were years in the future. If you could have a way to safely load third-party apps onto your phone... well, wow. You'd rule the world. You wouldn't just have a phone, you'd have a platform. This was not silliness, not at all. A Java VM was the first serious possibility of making a mobile phone into a serious, flexible, reconfigurable application platform.

It didn't work out very well, mostly because of Java's slowness and crappiness and licensing and (in the case of Java ME) horrendous lack of standardization. But GMail and Google Maps worked on my Blackberry, and millions of enterprise Blackberries are deployed running thousands of custom legacy enterprise apps you've never heard of that will make transitioning big established companies from Blackberry to iPhone virtually impossible for many years. In this case, pure thickheaded brute force did manage to win the day.

So okay, for the same reason that Java VMs started out as a good idea on Windows - namely, the platform itself lacked any security features - Java VMs made sense on phones. At first.

But embedded processors don't have those limitations anymore. They're serious processors now, with protected memory and everything. Most importantly, these processors were available and being used from the first day the first Google Phone was released. You no longer need a VM for security... but that means the VM doesn't provide any advantage at all.3

The fact that an Android phone has tolerable performance is, again, a triumph of pure thickheaded brute force. If you throw enough geniuses at a difficult technical problem, you might eventually solve that problem, even if the problem was stupid, and in this case, they mostly did.

But every step of the way, they're going to have this giant anchor of UniConf Dalvik tied around their neck, and Apple won't, and Apple's native apps will always run faster. It's going to be frustrating.

Maybe the speed won't matter. Maybe computers will get so fast that you just won't care anymore.

Java users have been saying that, too, since 1996.

Footnotes

1 I hope

2 Writing native desktop or server applications (ie. ones without crazy strict security requirements) using a Flash ("Adobe Air") or Javascript VM is kind of dumb for the same reasons set out in this article. There is one redeeming attribute of those systems, however: they already exist. If you have to have a VM for security on the web, then it makes sense to copy the runtime verbatim to the desktop/server, just because it's easier. Removing the VM would be possible and very nice, but it's just an optimization. Keeping the VM is easier, not harder, and thus is justifiable. (This doesn't really apply to Java since it never actually got popular for web apps.)

3 To pre-emptively refute a few common claims: "Write once run anywhere" doesn't actually work because the compiler was never the main problem; differences in OS semantics is the main problem, and you have to solve those equally for your apps in any language, even Java. Garbage collection can be and is frequently done in natively compiled languages. Introspection can be done in natively compiled languages. Digital signing of shared libraries can be implemented by any native shared library loader. Cross-language integration can be and is done all the time in native languages; in fact, VMs make this much harder, not easier, since now you have to rewrite all your languages. Sensible threading primitives (which some would say Java lacks anyway) can be implemented in any sensible language, natively compiled or not. Profile-driven optimization can be done in compiled languages. Support for multiple hardware architectures is just a recompile away - just ask any Mac developer. Provable memory protection (including prevention of all attempted null pointer dereferences) is doable and has been done in statically compiled languages. And before anyone asks, no, C/C++ does not do all these things; you need a good language. My point is that the good language needn't run in a VM; the VM is a red herring, a distraction.

Syndicated 2010-08-26 19:05:07 (Updated 2010-08-26 20:09:21) from apenwarr - Business is Programming

Why Northwestern Ontario is... in Ontario

    In another dispute, over which province should own what is now the northwest of Ontario, the [British] Judicial Committee sided with Ontario against Manitoba and Ottawa. The ruling still makes no sense. The people of the region still treat Winnipeg at their national capital. Why? Because it is their geographical capital. Toronto and the rest of Ontario belong to a distant, different world.

    -- John Ralston Saul, A Fair Country (p.162)

That was in the early 1900's, apparently.

I always wondered how the screwy shape of Ontario had come about; it figures. The actual Canadian federal government (Ottawa) thought it would make sense to lump us in with Manitoba, but somehow the British overseers thought otherwise.

If you're from Northwestern Ontario, here's a fun game you can play with your friends from Southern Ontario. First, take a paper map of Ontario. (I know, paper maps? What are those?) They're generally printed with Southern Ontario on one side and Northern Ontario on the other. In Southern Ontario, find two towns that are about an hour apart, and point them out on the map.

Now flip over the map and find two towns that look about the same distance apart, and ask your friend to estimate how far apart they are. See if they remember to check the map scale - most people don't realize that the Northern Ontario side is drawn much smaller, because the land is absolutely huge by comparison and has a much lower population density.

Now imagine you're working for the Ontario government - down in Toronto - and you still haven't realized this.

Syndicated 2010-08-10 14:00:31 from apenwarr - Business is Programming

Three bad things: threads, garbage collection, and nondeterministic destructrors

These three programming environment "features" all have one characteristic in common that makes them bad: non-repeatability. If you run the same program more than once, and it uses any of those three things, then chances are it won't run identically every time. Of course, if your program is written correctly, it ought to produce the same effective results every time, but the steps to produce those results might be different, and the output itself might be different, if effectively identical.

For example, in a threaded map/reduce operation, the output of each parallelized map() will reach the reduce() phase at different times. Supposedly, the output of reduce() should be the same regardless of the ordering, but it doesn't mean reduce() is performing the same calculations.

Imagine you're running a map/reduce on an original Pentium processor with the infamous FDIV bug, and your reduce() includes a division operation. Depending on the order of inputs, you might or might not trigger the bug, and your result might be different, and you'd be left wondering why. That's the problem with non-repeatability. Even without the FDIV bug, maybe your code is buggy, or maybe you're just introducing rounding errors or int overflows; the ordering can change the result, and debugging it is hard.

A more common problem is well-known to anyone using threads; if you don't put your locks in the right places, then your program won't be "correct", even if it seems to act correctly 99.999% of the time. Sooner or later, one of those race conditions will strike, and you'll get the wrong answer. And heaven help you if you're the poor sap who has to debug it then, because non-reproducible bugs are the very worst kind of bugs.

Garbage collection and non-determinism

But everyone knows that locking is the main problem with threads. What about the others?

Non-reproducible results from garbage collection go hand-in-hand with non-deterministic destructors. Just having a garbage collector thread run at random times can cause your program's GC delays to move around a bit, but those delays aren't too important. (Except in real-time systems, where GC is usually a pretty awful idea. But most of us aren't writing those.)

But non-deterministic destructors are much worse. What's non-deterministic destruction? It's when the destructor (or finalizer) of an object is guaranteed to run at some point - but you don't know what point. Of course, the point when it runs is generally the point when the GC decides to collect the garbage.

And that's when the non-repeatability starts to really become a problem. A destructor can do anything - it can poke at other objects, add or remove things from lists or freelists, send messages on sockets, close database connections. Anything at all, happening at a random time.

Smart people will tell you that, of course, a destructor can do anything, but because you don't know when it'll run, you should do as little in the destructor as possible. In fact, most objects don't even need destructors if you have garbage collection! Those people are right - mostly. Except "as little as possible" is still too much, and as soon as you have anything at all in your destructor, it starts spreading like a cancer.

In the .net world, you can see this problem being hacked around every time you see a "using" statement. Because destructors in .net are non-deterministic, some kinds of objects need to be "disposed" by hand - back to manual memory management. The most common example seems to be database handles, because some rather lame kinds of databases slurp huge amounts of RAM per handle, and your web app will grind to a halt if it produces too many queries in too short a time without explicitly freeing them up.

But no problem, right? You can just get into the habit of using using() everywhere. Well, sort of. Unfortunately, objects tend to get included into other objects (either using inheritance or just by including member objects). What if one of those member objects should be dispose()d explicitly when your container object is destroyed? Well, the containing object now needs to implement its own dispose() that calls its member objects' dispose(). But not all of them; only the members that actually have a dispose(). Which breaks encapsulation, actually, because if someone adds a dispose() to one of those member objects later, you'll have to go through all your containing objects and get them to call it. And if you have a List, the List won't know to call dispose() when you remove an object from the list. How could it?

So then some people decide that, as a policy, just to be safe, every kind of object in their app should start off with a dispose(), and you should always call it, just in case you need to actually use it for something later.

Ha ha! And now you're back to manually destroying all your objects - dispose() doesn't get called automatically by the garbage collector - just in case. Only it's worse, because writing dispose() involves more boilerplate re-entrancy junk than a destructor would have!

In some sense, this is still "better" than a completely GC-less language, because at least if you forget to call dispose(), the symptoms are... harder to see. Which means you can ignore them, right? It'll come up as, say, database queries randomly failing (because you've used all your handles), but only under high load, and only when a certain kind of operation (the kind the mis-implemented dispose()) is being used. But that's okay, you can just retry the query, and it'll probably work the next time, because the GC runs pretty frequently when you're under high load. Oh, and no valgrind-like tool can save you, because the objects are all being collected. Eventually. They're not technically leaks! Oh, thank goodness for that garbage collecor, technically saving me from leaks.

(In case you're keeping score: Java has neither dispose() nor using(), nor deterministic destructors. You just have to fake it all up yourself.)

The same weirdness can happen with any kind of non-memory object handle, not just databases, of course. GC proponents like to tell you how "almost everything you allocate is just memory" as if that allows you to ignore the problem 99% of the time. But that's useless. Every program talks to the outside world eventually, so you inevitably end up with a lot of objects referencing a lot of real-life handles, and you make mistakes. One server program I wrote in C# had a strange bug where it wouldn't close connections for a random amount of time after they had finished - less delay under high load, more time under low load. Why? Because the socket wasn't explicitly disposed when the last asynchronous object was done with it. I had a bunch of objects floating around holding a reference to the socket, and all of them were triggering based on other things happening in an event loop; there was no "primary owner," and so there was no way to use the using() syntax, or to implicitly know when to call dispose(). We just wanted to dispose() when the last runner object was finally destroyed.

A solution (well, sort of)

Which brings us to the semantics people actually want for their objects: refcounting and deterministic destructors.

Even in C#, the solution to my dispose() problem was to implement a refcount. For each object that took a reference to my socket, I incremented the refcount by one; when one of those objects was destroyed (or rather, dispose()d, or it *again* wouldn't be deterministic), it reduced the socket's refcount by one. And when the socket refcount went to zero, it dispose()d the socket immediately.

Of course, implementing refcounts in C# is much more painful than in C++ (where it's pretty common), because... there are no deterministic destructors. Ha ha! In C++, you can create a "refcounted smart pointer" object, which essentially has the behaviour of incrementing a refcount when it's created, and decrementing the refcount when it's destroyed, and deleting the pointed-to object when the refcount goes to zero. If you pass these smart pointers around on the stack or use them to store stuff in objects, smart pointers are created and destroyed automatically, and sooner or later your "real" objects are destroyed automatically as they should be - right away, as soon as the last user is done with them.

It's very elegant and a bit recursive; we use deterministic destructors to implement smart pointers so that we can have deterministic destruction. But without deterministic destructors, the smart pointer trick is out the window; your smart pointers would be dereferencing objects at random times when the GC cleans up the smart pointer! You're reduced to updating refcounts by hand, as if you were a lowly C programmer. Gross.

Now, let's say you're coding in perl, python, ruby, or (of all things!) Visual Basic. In those languages, all your objects have refcounts, and all the refcounts are updated automatically. If I write this in python:

    if x:
        open("filename", "w").write(getpid())

Then you can create a file, write to it, and close it, all in one line. The closing is implicit, of course; it happens right away as soon as there are no more references to the file. Because, obviously, nothing else would make sense.

One of the reasons I'm writing this article at all is that people are underestimating just how valuable this behaviour is. Even the python developers themselves have declared deterministic destruction to be an "implementation detail" of the original python implementation, which is not guaranteed to be carried into other python implementations or even maintained in future versions of plain python. Of course, they had to say that, since people have ported python to run on the Java and .net virtual machines, which... lack deterministic destructors. So the above code will already act weird on IronPython, for example.

Instead, the python people, in recent versions, have introduced the "with" statement, which is really just using() with slightly fancier semantics. And that's very sad. Down that path lies the insanity that is .net, with every single object eventually needing to be manually "disposable," just in case.

In python, sockets close, database queries get freed, and files get closed, all when you need them to. Plus, your program chews through less memory, because if you create temporary objects in a tight loop, they'll get freed right away instead of sometime later.

And now to get back to where we started. With refcounting, your objects are always destroyed in the same sequence, even if you add lines of code or initialize new objects in the middle.

When refcounts go wrong

So then why does anyone bother with non-deterministic garbage collection?

Well, first of all, refcounting runs extra code every time to assign a reference, which can theoretically slow down your program. "Real" GC only runs through all your pointers occasionally, which might be less work overall than checking every time you assign a reference. A GC might even be smart enough to aim for idle periods in your program - for example, finish an entire request, then GC the temporary objects while waiting for the next request to come in - which is theoretically very fast. The reality doesn't always match the theory, but it does sometimes and I don't have any benchmarks to show you, so let's leave it at that: a pure GC can run faster than a refcounted one. Sometimes. Maybe.

But much more importantly, we come back to those pesky threads. If you have threads, then your refcounts have to be synchronized between threads; there are lots of ways to do that, but all of them involved having your processor cores communicate with each other, just in case they're sharing the same memory, every time you assign an object reference. And if refcounting was maybe, sometimes, moderately slower than GC before, well, when you add in thread synchronization, it gets much slower. So slow that reasonable people don't even think about trying it.

That's why Java and .net don't use refcounting, of course; they were heavily designed around threads. Microsoft COM uses refcounting, even though it was designed with threads in mind, but you don't notice because COM is horrendously slow anyhow and refcounting is the least of your worries. And presumably, since the python developers are still thinking about the day when python will maybe work well with threads, that's why they don't want to promise they'll never switch away from refcounting.

(Python's refcounting isn't a bottleneck right now, even with threads, because python doesn't do thread synchronization around its individual refcounts. It just has a single global interpreter lock that prevents any other threads from being in the interpreter at all, which is much faster in the common (not heavily threaded) case. That's why removing the GIL is so famously hard - because every way they've tried to do it, it makes single-threaded programs slower.)

So what does all this mean?

Well, it means we're headed for trouble. Intel and other chip manufacturers are insistent that we need to update our programs - and thus our programming languages - to do more stuff with threads. Watch out, they say, because soon, there'll be no way to make your program fast unless you use multiple cores simultaneously. And that means threads, which means garbage collection instead of refcounting, which means non-deterministic destructors.

And that sucks! Because let's face it, the reason programmers are switching to languages like python and ruby is that speed doesn't matter as much as a programmer's sanity. Threads, refcounting, and non-deterministic destructors are bad for a programmer's sanity.

It's a step backwards. Intel is desperately selling us something we didn't actually need.

Last time Intel told us we had to switch programming paradigms and make our programs more complicated in order to move into the future, they were trying to sell us the Itanium. That foolishness make time for AMD to invent a sensible 64-bit processor that didn't require you to solve a bunch of previously-unsolved computer science problems just to make your programs faster. Intel is repeating the same trick here; they want us to make our programs harder to write, so that their processors don't have to be harder to design. I sympathize, but I'm not falling for it. Threads aren't the only way; certainly not threads as they've been done until now.

So far, the vast majority of the programs I write have been single-threaded, refcounted (or manually memory managed), and deterministic. I think I'll continue to do it that way, thanks.

And you'd better believe most other people will too.

Syndicated 2010-08-09 03:20:30 from apenwarr - Business is Programming

Preparations for a Tradeshow

Me: We wouldn't be so *presumptuous* to assume we could just make a product without talking to actual users in the market first. What do you think we would we do, just show up and *tell* you what you want?

Co-worker: You can't actually say that with a straight face, can you?

Syndicated 2010-08-07 23:58:53 from apenwarr - Business is Programming

A bit more language meditation

After the last couple of bits on C++, I thought I would offer something a little more melodramatic: human languages, as experienced in Montreal. Back in 1887.


Old Montreal, 1887, via the McCord Museum's Flickr Feed

Notice anything funny about this picture?

I'll give you a hint: Montreal is a primarily French-speaking city. Looking at the 2006 census, 13% of the population spoke English as a first language, compared to 54% with French as a first language.1

...

...

...Yet every single sign in that photo is in English! If you looked at the same street today, you would see every single sign is in French.2

Why? Because of Bill 101 from 1977, informally known as the Quebec "language law." Among other things, that law says any public sign has to be primarily in French; no other language can be "more prominent" than French.

The mere existence of the language law is, itself, fascinating. It's a flagrant violation of the free speech rights guaranteed by the Canadian Charter of Rights and Freedoms.

But Canadians always hedge their bets, so in addition to free speech, the Charter of Rights and Freedoms also has a section called the Notwithstanding Clause.3 That clause basically says the government can enact any laws they want that violate your rights, as long as they comply with a few basic rules, such as refreshing said laws every few years. (Unlike other Canadian laws, rights-violating ones expire automatically.)

As you might imagine, laws that very literally violate human rights can cause a bit of a fuss. This one certainly does - and it has continued to do so since it was first brought in. The need to refresh it every few years guarantees that it gets back into the news every few years, which is both healthy and stressful.

I'm a native English speaker myself, so this law comes down to racism against me. But you know what? I think it's a good law. 54% of Montrealers speak French as a first language; almost all the rest (even me) can speak at least basic French when necessary.

As the story goes, the reason the rule was needed in the first place was this: while almost every French speaker had learned basic English - after all, the people bordering Quebec in every direction are largely anglophone, so there are lots of chances to learn - the much smaller English population didn't bother to learn French. Because if all the French people are willing to speak English anyway, why bother? And if you're making a sign - even if you're a French person making a sign - are you going to make one that 54% of people can understand, or one that 99% of people can understand? That's right. If you're a wise French business owner serving primarily French customers, you'll make your sign in... English.

Those lazy English people have a point. It really is a lot of work to learn French, just so you can speak French in this tiny little enclave of non-English on a whole continent of English. I find it completely believable that English people are so lazy; my own crappy French skills are testament to that. And quite simply, Quebec's French speaking majority called us on it. They demanded justice: they demanded the right to be served in the language of the majority.

And in order to give people that right - a right not guaranteed by the Charter of Rights and Freedoms - we had to violate another right, namely, the right not to talk to people in French. If it weren't for the magic of the Notwithstanding Clause, the government would be enforcing civil rights... but the wrong ones.

By the way, if you're American, and you come to Montreal and people pretend they can't speak English, you're absolutely right: they are pretending. Because you just rudely jabbed them with hundreds of years of cultural history.4 (Note: it's not a pretense in other parts of Quebec, where people often speak exclusively French. And obviously some people in Montreal really don't speak English, but it's fewer than it seems.)

So what does all this have to do with programming?

Well, about that C or C++ or Java. Do you really use it because it's better? Or because that's the one thing everyone can understand, even though it's not actually the best choice for most people? If someone made a law forcing everyone to write stuff in a particular language - say, Objective C - in order to prevent the oppression that is, say, Flash - is that a violation of your freedoms or is someone out there actually protecting you?

Okay. It's a stretch.

Footnotes

1 I tried to look at the 1861 census (okay, it's a few years off, but whatever), but the statistics defeated me. They weren't surveying people's mother tongue at the time, though they did survey people's birthplaces. At least 48% of the population at the time was of "Canadian - French origin" birth, with about 26% from Britain/Ireland/United States. However, that doesn't account for an additional 25% of "Canadian - Not of French Origin." How much of that is anglophone? I don't know. Perhaps there were more anglophones than francophones in Montreal back in 1861? I don't know. How did it change by 1887? I don't know. This is the sort of information I would like to retrieve from Wolfram Alpha if only it weren't a useless piece of junk.

2 You would also notice that Montreal's winter road conditions are about the same as ever.

3 There's also the "Limitations Clause," which says the government can violate your rights, but only if they're consistent and there's a good reason. And don't do it any more than necessary ("minimal impairment"). From the outside, it's hard to believe weird stuff like this works, but the emphasis on using power responsibly instead of blindly following the letter of the law is what Canada is all about.

4 Tip: I warn my American friends who visit Montreal that one simple change in behaviour will make your experience vastly more enjoyable. When you start a conversation, any conversation, just saying hi in a store or ordering in a restaurant - do your utmost to start it in French. You know, Bonjour, parlez-vous anglais, mispronouncing stuff off the French side of the menu, whatever. It doesn't matter if you suck at French. You probably won't get more than 5 words out before the person switches to flawless English. Why? Because you acknowledged that they have rights. Imagine if some people from France flew to New York, walked into a restaurant, and refused to speak anything but French. Would you think that was cute? Acceptable? Remotely reasonable? Of course not. You'd think they were idiots. But if you know some French, and they came in and tried their best at English, but had a terrible accent and awful grammar, you'd switch to French as a favour to them. Because they're not being idiots, and you're a nice person. Etiquette really is that easy.

Syndicated 2010-07-24 07:34:01 from apenwarr - Business is Programming

22 Jul 2010 (updated 22 Jul 2010 at 02:04 UTC) »

How to design a replacement for C++

My last article on the ugliness that is C++ didn't actually receive this complaint, but it should have: I offered a lot of criticism, but no constructive criticism.

I feel a little guilty about it, so let me try to resolve that here with some actual, constructive advice to language designers, for anyone who cares to listen. (Maybe nobody cares to listen, and in fact this will be much less interesting than the blind ranting of my last article. Too bad. Stop reading now if you're bored.)

The first thing you need to know about C/C++ is that they're only barely worth fixing anyway.

C has too few features, and C++ has far too many awful ones. Reasonable people might disagree on which features C is missing and which C++ should lose. But most people would agree at least that C could be usefully extended, and C++ could be usefully simplified (and maybe have a few cleanups, like my earlier suggestions of operator[]=, a sensible method pointer, and sensible standard strings).

We also know that neither change will happen. The C people, having seen what happens when you extend your language willy-nilly (ie. C++) are deathly afraid of it and will never ever change again. The C++ people are well set on their path (ie. ultimate salvation is right around the corner if we can just add a little more crack to our templates) and will never let it go.

But anyway, that doesn't really matter. C and C++ both get the job done in their respective niches. And those niches are shrinking dramatically. Once upon a time, you'd surely write all your apps in C or C++; nowadays, almost everything is better off written in a language with more built-in stuff. My personal tool of choice nowadays (when appropriate; I'll get to that in a minute) is python for most stuff, with C modules added on for the parts that have to be fast. It works excellently, as judged by my favourite metrics of fewer lines of code, increased readability, and maximum performance.

You might prefer ruby or C# or something intead of python. That's fine, although python seems to be the winner so far when it comes to a super-easy and efficient C extension system. (C#, including mono, makes me especially angry because C extensions often run slower than native C#. There's a massive and stupid overhead required to escape from the runtime down into native space and it often outweighs the speed gained from C. Duh. In python the overhead of calling into a C module is essentially zero.)

To a large extent, the reason you can get away with using "higher level" languages like python or ruby or C# is that computers have gotten faster and have a lot more memory than they used to. You need the faster computer to run an interpreted language, and you need more memory because you have garbage collection instead of manual memory management. But we've got the horsepower now. Might as well use it.

That means C and C++ are on the decline and they're just going to get smaller. Good. The world will be a better place for it.

But there will always be programs that have to be written in a language like C and C++. That includes kernels, drivers, highly performance-sensitive code like game engines, virtual machines, some kinds of networking code, and so on. And for me in particular, it also includes new plugins to existing C-based legacy systems, including Microsoft Office.

These programs are never going to go away. So deciding that they will, forever, have to suffer with the limitations of either C or C++ is kind of disappointing. And yet there is still no language - not even the hint of a beginning of a language - that can seriously claim to replace them. Here are the key "features" you will absolutely need to avoid if you want any chance at replacing C.

Things you absolutely must not do if you want to replace C

  1. Do not remove the ability to directly call into (and be called by) C and ASM without any wrapper/translation layers. When I want to call printf() from C or C++, I #include stdio.h and move on with my life. No other language makes it that easy. None. Zero. Do not be those other languages.

  2. Do not remove the cpp preprocessor. Look, I realize you are morally opposed to preprocessors. Well fuck you. If you take it out, I can't #include stdio.h, and I can't implement awesome assert-like macros. End of discussion.

  3. Avoid garbage collection. Garbage collection is fine as a concept, but you will never, ever, be able to write a good kernel if you try to use garbage collection. This is non-negotiable. Also, plugins to existing C programs won't fly with garbage collection, because you won't be able to usefully mark-and-sweep through the majority of non-garbage-collected memory, and you can't safely pass gc'd vs. non-gc'd memory back and forth between C and your language. Maybe your language can have optional garbage collection, but optional has to mean globally disabled across the entire executable.

  4. Avoid mandatory "system" threads. If you're writing a kernel, you're the guy implementing the threading system, so if your language requires threads, you're instantly dead in the water. Garbage collection often uses a separate mark-and-sweep thread, which is another reason gc just isn't an option. But it's even more insidious than that: what happens when you fork() a program that has threads? Do you even know? If the threads were created by the runtime, will it be sane even 1% of the time? You can't invent Unix if you can't fork().

  5. Avoid a mandatory standard library. People can - and do - compile entire C programs without using any standard library functions at all. Think about a kernel, for example. Even memory allocation is undefined until the kernel defines it. Most modern languages are integrated with their standard library - ie. some syntax secretly calls into functions - and this destroys their suitability for some jobs that C can do.

  6. Avoid dynamic typing. Dynamic typing always requires some sort of dictionary lookups and is, at best, slightly slower than static typing. To replace C in the cases where it refuses to die, you can't have a language that's almost as fast as C. It has to be as fast as C. Period. Google Go has some great innovations here with its static duck typing. Objective C is okay here because the dynamic typing is optional.

  7. Avoid support for exception handling. It's just too complicated, and moreover, C people just hate exceptions so they will hate you, too. And since C doesn't know about exceptions, you will make a mess when C calls you, then you throw (but don't catch) an exception. Just leave it out.

  8. Do not make it harder to do things in your language than they would be in C. Maybe this isn't even worth mentioning. But the upper bound on the lines of code it takes to do something should be the equivalent in C. Making your language backward-compatible with C is one way (not the only way) to achieve this.

All this sounds terrible, right? Why even bother if you can't have these obvious features? But actually, there are a bunch of things you can add and make things much, much better than C without making your language unacceptable in C's niche.

Things you can add to your language to make it better than C without ruining your chances to replace it

  1. Deterministic constructors/destructors (RAII). This is, quite probably, my favourite feature of C++ and the primary thing that makes me hate going back to C. (The lack of it is also what makes me hate almost every other high-level language. Python, thankfully, has this, although they claim that it's an implementation detail that could go away at any time. And IronPython can't do it. Bastards.) Deterministic constructors and destructors make smart pointers and automatic refcounting possible (and delightful!) and let you write things in one line of C++ that would take 10 lines of C. No exaggeration. And it compiles down to the same thing that C would, so there's no runtime cost.

  2. Closures and anonymous functions. In fact, Apple has already added these in an incompatible variant of C. Maybe you like them, maybe you don't, maybe you think they're God's gift to programming and any language without them is an infidel. But adding them would be harmless, anyway. (Update 2010/07/21: I mean harmless in that it wouldn't bloat the compiled code; it compiles down to the same ASM as the equivalent verbose C code, and if you don't use it, you don't pay for it.)

  3. Implicit user-defined typecasts. These are a tricky feature of C++ and some C people hate them because they hide stuff they think should be explicit. But you need this if you want to implement non-gross smart pointers and user-defined string objects.

  4. Operator overloading. You have to be seriously tasteful about this one. If you don't think you can handle the pressure, leave it out. But in the name of God, at least make operator== do something sane by default.

  5. Automatic vtable generation. It doesn't have to be full-on OOP, and you don't need multiple inheritence and any of that stuff. But a huge number of lines in C programs are taken up declaring things that are basically vtables. Make it better. Google Go has some great ideas here. This one feature is probably the only good thing about Objective C.

  6. Some sort of generics so you can make type-safe containers. Note, I'm not saying templates here. C++ has made templates a dirty word; you want to copy precisely none of their template stuff. But C# (up to, but not including, C# 4.0) has some very nice (and highly optimizable in native code) generics ideas that you can steal. Also note: I'm not saying generics are necessary in a language that replaces C. C doesn't have them and it survives. Most attempts at a C replacement leave this out of version 1 and add it to version 2, and that's perfectly okay.

  7. One-time declaration/definition of functions. In C or C++, you have to declare your stuff in a header file, then define it in an implementation file. Your header file then gets compiled over and over again by everyone who uses your functions. (In C++ it gets even worse: your templates have to be defined in the header, so compiling every file ends up compiling half of your bloody program.) This is awful, and is the primary reason compiling C and C++ is slow. The problem has also been completely solved since the 1990's. Check out Turbo Pascal sometime. C# and Java, for all their flaws, have also thoroughly solved this. (Update 2010/07/21: Just because you absolutely must not remove the preprocessor doesn't mean you have to use it for declaring functions. The preprocessor is valuable for macros, not for function declarations.)

  8. Standardized string handling. Actually I don't think this is very important; much more important is the ability to keep letting people define their own string types. As I mentioned in my previous article, I disagree with the conventional wisdom that allowing user-defined string types was a major mistake of C++. Strings are often the slowest part of your program. Making them possible to optimize or replace is a good idea; adding some sugar to construct compile-time string literals directly in a user-defined data type would be even better. However, even so, having a decent default string type couldn't possibly make things worse (as long as you can ignore it when it gets in the way, ie. in a kernel).

  9. Implicit pass-by-reference. I'm totally addicted to the way python passes objects by reference, and only by reference. (Pedants would say it actually "passes references by value." I know the difference. I don't care.) This is probably hard to pull off without garbage collection support, but if you can do it, you'll be my hero. At the very least, let us use reference syntax wherever we might normally use pointer syntax, because requiring us to manually dereference pointers all the time was a mistake. And once you've done that, maybe remove pointer syntax altogether, because it's kind of redundant in C++ to have both. (The only exception is that in C++, you can't reassign what a reference points to. But that's only because they're idiots. Just let me do that, and pointer syntax is entirely obsolete.)

  10. Typesafe varargs. C++ totally failed at this, with utterly awful results (ie. lots and lots of templates that define every version of a function with 1 to n parameters). C varargs are great, but they're not typesafe, and while that's great sometimes, it's less great other times. A simple varargs syntax that coerces all the arguments into a particular type (presumably using your implicit user-defined typecasts from above) would be easy and highly useful.

  11. Lots of other things. This is not a complete list of features you should add to your language. Go crazy! Language design is an act of creativity, and most language features will not make your language unacceptable as a C replacement. Just don't break any of the "must not do" rules up above.

Current C and C++ alternatives and why they aren't popular

Apple/NeXT have been single-handledly pushing Objective C since, I don't know, maybe the 1980's or at least the 90's. It makes none of the "must not do" errors (since its dynamically-typed objects are optional). I personally suspect the reasons for its slow adoption are simple: a) Objective C isn't enough better than C and adds nothing if you don't use its dynamic typing; and b) the syntax is infernally ugly. Think about this: for all we know, the Linux kernel is actually written using every feature in the kernel-compatible subset of Objective C. Basically it neither wins nor loses. Mu.

The D language started out as a good idea, but they went crazy in version 2. Also, they require garbage collection, so they're instantly disqualified.

Google Go has tons of great stuff inside and meets almost all of the above requirements. Unfortunately it is also garbage collected, so it's instantly disqualified. (This one hurts me to the bottom of my soul, because the other stuff looks so great. But I'm not disqualifying it because I'm subjective, I'm disqualifying it because it just won't do the job as long as it requires garbage collection.)

C# is a rather nice language overall and, in fact, has very little in it that prevents it from being natively compiled. (Mono actually has a way to compile it natively nowadays, called their "AOT" (ahead of time) compiler.) However, it requires a big huge gunky runtime and garbage collection and at least one system thread and it parses XML at startup time - strace it and see! - so no luck. (I left XML out of the "must not do" list because I thought it was obvious. Don't make me regret it.)

Java actually fails at every single point in this article. Okay, not really. But they did manage to botch most of it in rather spectacular ways.

Any others that I've missed?

Note that C++ meets all the above requirements. That's why it was able to replace C for so many things. The main reason C++ doesn't replace C for a bunch of other things is that it's just too crazy and it encourages you, as the developer, to also be crazy. See my previous rant for all about that.

P.S. No, I am not planning to make my own C replacement language. When python isn't appropriate, I will continue using and complaining about C++, while desperately attempting to use it tastefully, if that is even possible. I will, however, switch to your language if it meets all my requirements. So you'll have at least one user.

Update 2010/07/21: Wow, this hit the front page of news.ycombinator.com in less than 10 minutes. Thanks, guys. But I see there is some confusion about where I stand on C vs. C++ specifically, and why C++ is not the answer if my question is how to replace C. Good question!

The problem with C is that it works but is missing stuff; the problem with C++ is that it tried to add stuff, but the result is hideous. That's a totally subjective evaluation of C++ (see my previous rant for some concrete examples) but it's one that a lot of people seem to agree with. The goal here is to identify the "necessary but not sufficient" rules for creating a C replacement that has a chance of winning. You may hate C++, but it met those criteria, and so it became massively popular, hideous or not. I just want more options; please make me a language that is necessary, sufficient, and not hideous.

Syndicated 2010-07-22 00:36:40 (Updated 2010-07-22 02:04:26) from apenwarr - Business is Programming

How to design a replacement for C++

My last article on the ugliness that is C++ didn't actually receive this complaint, but it should have: I offered a lot of criticism, but no constructive criticism.

I feel a little guilty about it, so let me try to resolve that here with some actual, constructive advice to language designers, for anyone who cares to listen. (Maybe nobody cares to listen, and in fact this will be much less interesting than the blind ranting of my last article. Too bad. Stop reading now if you're bored.)

The first thing you need to know about C/C++ is that they're only barely worth fixing anyway.

C has too few features, and C++ has far too many awful ones. Reasonable people might disagree on which features C is missing and which C++ should lose. But most people would agree at least that C could be usefully extended, and C++ could be usefully simplified (and maybe have a few cleanups, like my earlier suggestions of operator[]=, a sensible method pointer, and sensible standard strings).

We also know that neither change will happen. The C people, having seen what happens when you extend your language willy-nilly (ie. C++) are deathly afraid of it and will never ever change again. The C++ people are well set on their path (ie. ultimate salvation is right around the corner if we can just add a little more crack to our templates) and will never let it go.

But anyway, that doesn't really matter. C and C++ both get the job done in their respective niches. And those niches are shrinking dramatically. Once upon a time, you'd surely write all your apps in C or C++; nowadays, almost everything is better off written in a language with more built-in stuff. My personal tool of choice nowadays (when appropriate; I'll get to that in a minute) is python for most stuff, with C modules added on for the parts that have to be fast. It works excellently, as judged by my favourite metrics of fewer lines of code, increased readability, and maximum performance.

You might prefer ruby or C# or something intead of python. That's fine, although python seems to be the winner so far when it comes to a super-easy and efficient C extension system. (C#, including mono, makes me especially angry because C extensions often run slower than native C#. There's a massive and stupid overhead required to escape from the runtime down into native space and it often outweighs the speed gained from C. Duh. In python the overhead of calling into a C module is essentially zero.)

To a large extent, the reason you can get away with using "higher level" languages like python or ruby or C# is that computers have gotten faster and have a lot more memory than they used to. You need the faster computer to run an interpreted language, and you need more memory because you have garbage collection instead of manual memory management. But we've got the horsepower now. Might as well use it.

That means C and C++ are on the decline and they're just going to get smaller. Good. The world will be a better place for it.

But there will always be programs that have to be written in a language like C and C++. That includes kernels, drivers, highly performance-sensitive code like game engines, virtual machines, some kinds of networking code, and so on. And for me in particular, it also includes new plugins to existing C-based legacy systems, including Microsoft Office.

These programs are never going to go away. So deciding that they will, forever, have to suffer with the limitations of either C or C++ is kind of disappointing. And yet there is still no language - not even the hint of a beginning of a language - that can seriously claim to replace them. Here are the key "features" you will absolutely need to avoid if you want any chance at replacing C.

Things you absolutely must not do if you want to replace C

  1. Do not remove the ability to directly call into (and be called by) C and ASM without any wrapper/translation layers. When I want to call printf() from C or C++, I #include stdio.h and move on with my life. No other language makes it that easy. None. Zero. Do not be those other languages.

  2. Do not remove the cpp preprocessor. Look, I realize you are morally opposed to preprocessors. Well fuck you. If you take it out, no C programmer will switch to your language. End of discussion.

  3. Avoid garbage collection. Garbage collection is fine as a concept, but you will never, ever, be able to write a good kernel if you try to use garbage collection. This is non-negotiable. Also, plugins to existing C programs won't fly with garbage collection, because you won't be able to usefully mark-and-sweep through the majority of non-garbage-collected memory, and you can't safely pass gc'd vs. non-gc'd memory back and forth between C and your language. Maybe your language can have optional garbage collection, but optional has to mean globally disabled across the entire executable.

  4. Avoid mandatory "system" threads. If you're writing a kernel, you're the guy implementing the threading system, so if your language requires threads, you're instantly dead in the water. Garbage collection often uses a separate mark-and-sweep thread, which is another reason gc just isn't an option. But it's even more insidious than that: what happens when you fork() a program that has threads? Do you even know? If the threads were created by the runtime, will it be sane even 1% of the time? You can't invent Unix if you can't fork().

  5. Avoid a mandatory standard library. People can - and do - compile entire C programs without using any standard library functions at all. Think about a kernel, for example. Even memory allocation is undefined until the kernel defines it. Most modern languages are integrated with their standard library - ie. some syntax secretly calls into functions - and this destroys their suitability for some jobs that C can do.

  6. Avoid dynamic typing. Dynamic typing always requires some sort of dictionary lookups and is, at best, slightly slower than static typing. To replace C in the cases where it refuses to die, you can't have a language that's almost as fast as C. It has to be as fast as C. Period. Google Go has some great innovations here with its static duck typing. Objective C is okay here because the dynamic typing is optional.

  7. Avoid support for exception handling. It's just too complicated, and moreover, C people just hate exceptions so they will hate you, too. And since C doesn't know about exceptions, you will make a mess when C calls you, then you throw (but don't catch) an exception. Just leave it out.

All this sounds terrible, right? Why even bother if you can't have these obvious features? But actually, there are a bunch of things you can add and make things much, much better than C without making your language unacceptable in C's niche.

Things you can add to your language to make it better than C without ruining your chances to replace it

  1. Deterministic constructors/destructors (RAII). This is, quite probably, my favourite feature of C++ and the primary thing that makes me hate going back to C. (The lack of it is also what makes me hate almost every other high-level language. Python, thankfully, has this, although they claim that it's an implementation detail that could go away at any time. And IronPython can't do it. Bastards.) Deterministic constructors and destructors make smart pointers and automatic refcounting possible (and delightful!) and let you write things in one line of C++ that would take 10 lines of C. No exaggeration. And it compiles down to the same thing that C would, so there's no runtime cost.

  2. Closures and anonymous functions. In fact, Apple has already added these in an incompatible variant of C. Maybe you like them, maybe you don't, maybe you think they're God's gift to programming and any language without them is an infidel. But adding them would be harmless, anyway.

  3. Implicit user-defined typecasts. These are a tricky feature of C++ and some C people hate them because they hide stuff they think should be explicit. But you need this if you want to implement non-gross smart pointers and user-defined string objects.

  4. Operator overloading. You have to be seriously tasteful about this one. If you don't think you can handle the pressure, leave it out. But in the name of God, at least make operator== do something sane by default.

  5. Automatic vtable generation. It doesn't have to be full-on OOP, and you don't need multiple inheritence and any of that stuff. But a huge number of lines in C programs are taken up declaring things that are basically vtables. Make it better. Google Go has some great ideas here. This one feature is probably the only good thing about Objective C.

  6. Some sort of generics so you can make type-safe containers. Note, I'm not saying templates here. C++ has made templates a dirty word; you want to copy precisely none of their template stuff. But C# (up to, but not including, C# 4.0) has some very nice (and highly optimizable in native code) generics ideas that you can steal. Also note: I'm not saying generics are necessary in a language that replaces C. C doesn't have them and it survives. Most attempts at a C replacement leave this out of version 1 and add it to version 2, and that's perfectly okay.

  7. One-time declaration/definition of functions. In C or C++, you have to declare your stuff in a header file, then define it in an implementation file. Your header file then gets compiled over and over again by everyone who uses your functions. (In C++ it gets even worse: your templates have to be defined in the header, so compiling every file ends up compiling half of your bloody program.) This is awful, and is the primary reason compiling C and C++ is slow. The problem has also been completely solved since the 1990's. Check out Turbo Pascal sometime. C# and Java, for all their flaws, have also thoroughly solved this. Just because you absolutely must not remove the preprocessor doesn't mean you have to use it for declaring functions.

  8. Standardized string handling. Actually I don't think this is very important; much more important is the ability to keep letting people define their own string types. As I mentioned in my previous article, I disagree with the conventional wisdom that allowing user-defined string types was a major mistake of C++. Strings are often the slowest part of your program. Making them possible to optimize or replace is a good idea; adding some sugar to construct compile-time string literals directly in a user-defined data type would be even better. However, even so, having a decent default string type couldn't possibly make things worse (as long as you can ignore it when it gets in the way, ie. in a kernel).

Current C and C++ alternatives and why they aren't popular

Apple/NeXT have been single-handledly pushing Objective C since, I don't know, maybe the 1980's or at least the 90's. It makes none of the "must not do" errors (since its dynamically-typed objects are optional). I personally suspect the reasons for its slow adoption are simple: a) Objective C isn't enough better than C; and b) the syntax is infernally ugly.

The D language started out as a good idea, but they went crazy in version 2. Also, they have garbage collection, so they're instantly disqualified.

Google Go has tons of great stuff inside and meets almost all of the above requirements. Unfortunately it is also garbage collected, so it's instantly disqualified.

C# is a rather nice language overall and, in fact, has very little in it that prevents it from being natively compiled. (Mono actually has a way to compile it natively nowadays, called their "AOT" (ahead of time) compiler.) However, it requires a big huge runtime and garbage collection, so no luck.

Objective C avoids the "must not do" requirements, except for dynamic typing and a standard library, but those are optional. Unfortunately its syntax is hideous and it adds almost nothing over C if you don't use the dynamic typing, so whether your kernel is written in C or Objective C is really a "mu" sort of question.

Any others that I've missed?

Note that C++ meets all the above requirements. That's why it was able to replace C for so many things. The main reason C++ doesn't replace C for a bunch of other things is that it's just too crazy and it encourages you, as the developer, to also be crazy. See my previous rant for all about that.

P.S. No, I am not planning to make my own C replacement language. When python isn't appropriate, I will continue using and complaining about C++, while desperately attempting to use it tastefully, if that is even possible.

Syndicated 2010-07-22 00:03:12 from apenwarr - Business is Programming

556 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!