22 Jul 2010 apenwarr   » (Master)

How to design a replacement for C++

My last article on the ugliness that is C++ didn't actually receive this complaint, but it should have: I offered a lot of criticism, but no constructive criticism.

I feel a little guilty about it, so let me try to resolve that here with some actual, constructive advice to language designers, for anyone who cares to listen. (Maybe nobody cares to listen, and in fact this will be much less interesting than the blind ranting of my last article. Too bad. Stop reading now if you're bored.)

The first thing you need to know about C/C++ is that they're only barely worth fixing anyway.

C has too few features, and C++ has far too many awful ones. Reasonable people might disagree on which features C is missing and which C++ should lose. But most people would agree at least that C could be usefully extended, and C++ could be usefully simplified (and maybe have a few cleanups, like my earlier suggestions of operator[]=, a sensible method pointer, and sensible standard strings).

We also know that neither change will happen. The C people, having seen what happens when you extend your language willy-nilly (ie. C++) are deathly afraid of it and will never ever change again. The C++ people are well set on their path (ie. ultimate salvation is right around the corner if we can just add a little more crack to our templates) and will never let it go.

But anyway, that doesn't really matter. C and C++ both get the job done in their respective niches. And those niches are shrinking dramatically. Once upon a time, you'd surely write all your apps in C or C++; nowadays, almost everything is better off written in a language with more built-in stuff. My personal tool of choice nowadays (when appropriate; I'll get to that in a minute) is python for most stuff, with C modules added on for the parts that have to be fast. It works excellently, as judged by my favourite metrics of fewer lines of code, increased readability, and maximum performance.

You might prefer ruby or C# or something intead of python. That's fine, although python seems to be the winner so far when it comes to a super-easy and efficient C extension system. (C#, including mono, makes me especially angry because C extensions often run slower than native C#. There's a massive and stupid overhead required to escape from the runtime down into native space and it often outweighs the speed gained from C. Duh. In python the overhead of calling into a C module is essentially zero.)

To a large extent, the reason you can get away with using "higher level" languages like python or ruby or C# is that computers have gotten faster and have a lot more memory than they used to. You need the faster computer to run an interpreted language, and you need more memory because you have garbage collection instead of manual memory management. But we've got the horsepower now. Might as well use it.

That means C and C++ are on the decline and they're just going to get smaller. Good. The world will be a better place for it.

But there will always be programs that have to be written in a language like C and C++. That includes kernels, drivers, highly performance-sensitive code like game engines, virtual machines, some kinds of networking code, and so on. And for me in particular, it also includes new plugins to existing C-based legacy systems, including Microsoft Office.

These programs are never going to go away. So deciding that they will, forever, have to suffer with the limitations of either C or C++ is kind of disappointing. And yet there is still no language - not even the hint of a beginning of a language - that can seriously claim to replace them. Here are the key "features" you will absolutely need to avoid if you want any chance at replacing C.

Things you absolutely must not do if you want to replace C

  1. Do not remove the ability to directly call into (and be called by) C and ASM without any wrapper/translation layers. When I want to call printf() from C or C++, I #include stdio.h and move on with my life. No other language makes it that easy. None. Zero. Do not be those other languages.

  2. Do not remove the cpp preprocessor. Look, I realize you are morally opposed to preprocessors. Well fuck you. If you take it out, I can't #include stdio.h, and I can't implement awesome assert-like macros. End of discussion.

  3. Avoid garbage collection. Garbage collection is fine as a concept, but you will never, ever, be able to write a good kernel if you try to use garbage collection. This is non-negotiable. Also, plugins to existing C programs won't fly with garbage collection, because you won't be able to usefully mark-and-sweep through the majority of non-garbage-collected memory, and you can't safely pass gc'd vs. non-gc'd memory back and forth between C and your language. Maybe your language can have optional garbage collection, but optional has to mean globally disabled across the entire executable.

  4. Avoid mandatory "system" threads. If you're writing a kernel, you're the guy implementing the threading system, so if your language requires threads, you're instantly dead in the water. Garbage collection often uses a separate mark-and-sweep thread, which is another reason gc just isn't an option. But it's even more insidious than that: what happens when you fork() a program that has threads? Do you even know? If the threads were created by the runtime, will it be sane even 1% of the time? You can't invent Unix if you can't fork().

  5. Avoid a mandatory standard library. People can - and do - compile entire C programs without using any standard library functions at all. Think about a kernel, for example. Even memory allocation is undefined until the kernel defines it. Most modern languages are integrated with their standard library - ie. some syntax secretly calls into functions - and this destroys their suitability for some jobs that C can do.

  6. Avoid dynamic typing. Dynamic typing always requires some sort of dictionary lookups and is, at best, slightly slower than static typing. To replace C in the cases where it refuses to die, you can't have a language that's almost as fast as C. It has to be as fast as C. Period. Google Go has some great innovations here with its static duck typing. Objective C is okay here because the dynamic typing is optional.

  7. Avoid support for exception handling. It's just too complicated, and moreover, C people just hate exceptions so they will hate you, too. And since C doesn't know about exceptions, you will make a mess when C calls you, then you throw (but don't catch) an exception. Just leave it out.

  8. Do not make it harder to do things in your language than they would be in C. Maybe this isn't even worth mentioning. But the upper bound on the lines of code it takes to do something should be the equivalent in C. Making your language backward-compatible with C is one way (not the only way) to achieve this.

All this sounds terrible, right? Why even bother if you can't have these obvious features? But actually, there are a bunch of things you can add and make things much, much better than C without making your language unacceptable in C's niche.

Things you can add to your language to make it better than C without ruining your chances to replace it

  1. Deterministic constructors/destructors (RAII). This is, quite probably, my favourite feature of C++ and the primary thing that makes me hate going back to C. (The lack of it is also what makes me hate almost every other high-level language. Python, thankfully, has this, although they claim that it's an implementation detail that could go away at any time. And IronPython can't do it. Bastards.) Deterministic constructors and destructors make smart pointers and automatic refcounting possible (and delightful!) and let you write things in one line of C++ that would take 10 lines of C. No exaggeration. And it compiles down to the same thing that C would, so there's no runtime cost.

  2. Closures and anonymous functions. In fact, Apple has already added these in an incompatible variant of C. Maybe you like them, maybe you don't, maybe you think they're God's gift to programming and any language without them is an infidel. But adding them would be harmless, anyway. (Update 2010/07/21: I mean harmless in that it wouldn't bloat the compiled code; it compiles down to the same ASM as the equivalent verbose C code, and if you don't use it, you don't pay for it.)

  3. Implicit user-defined typecasts. These are a tricky feature of C++ and some C people hate them because they hide stuff they think should be explicit. But you need this if you want to implement non-gross smart pointers and user-defined string objects.

  4. Operator overloading. You have to be seriously tasteful about this one. If you don't think you can handle the pressure, leave it out. But in the name of God, at least make operator== do something sane by default.

  5. Automatic vtable generation. It doesn't have to be full-on OOP, and you don't need multiple inheritence and any of that stuff. But a huge number of lines in C programs are taken up declaring things that are basically vtables. Make it better. Google Go has some great ideas here. This one feature is probably the only good thing about Objective C.

  6. Some sort of generics so you can make type-safe containers. Note, I'm not saying templates here. C++ has made templates a dirty word; you want to copy precisely none of their template stuff. But C# (up to, but not including, C# 4.0) has some very nice (and highly optimizable in native code) generics ideas that you can steal. Also note: I'm not saying generics are necessary in a language that replaces C. C doesn't have them and it survives. Most attempts at a C replacement leave this out of version 1 and add it to version 2, and that's perfectly okay.

  7. One-time declaration/definition of functions. In C or C++, you have to declare your stuff in a header file, then define it in an implementation file. Your header file then gets compiled over and over again by everyone who uses your functions. (In C++ it gets even worse: your templates have to be defined in the header, so compiling every file ends up compiling half of your bloody program.) This is awful, and is the primary reason compiling C and C++ is slow. The problem has also been completely solved since the 1990's. Check out Turbo Pascal sometime. C# and Java, for all their flaws, have also thoroughly solved this. (Update 2010/07/21: Just because you absolutely must not remove the preprocessor doesn't mean you have to use it for declaring functions. The preprocessor is valuable for macros, not for function declarations.)

  8. Standardized string handling. Actually I don't think this is very important; much more important is the ability to keep letting people define their own string types. As I mentioned in my previous article, I disagree with the conventional wisdom that allowing user-defined string types was a major mistake of C++. Strings are often the slowest part of your program. Making them possible to optimize or replace is a good idea; adding some sugar to construct compile-time string literals directly in a user-defined data type would be even better. However, even so, having a decent default string type couldn't possibly make things worse (as long as you can ignore it when it gets in the way, ie. in a kernel).

  9. Implicit pass-by-reference. I'm totally addicted to the way python passes objects by reference, and only by reference. (Pedants would say it actually "passes references by value." I know the difference. I don't care.) This is probably hard to pull off without garbage collection support, but if you can do it, you'll be my hero. At the very least, let us use reference syntax wherever we might normally use pointer syntax, because requiring us to manually dereference pointers all the time was a mistake. And once you've done that, maybe remove pointer syntax altogether, because it's kind of redundant in C++ to have both. (The only exception is that in C++, you can't reassign what a reference points to. But that's only because they're idiots. Just let me do that, and pointer syntax is entirely obsolete.)

  10. Typesafe varargs. C++ totally failed at this, with utterly awful results (ie. lots and lots of templates that define every version of a function with 1 to n parameters). C varargs are great, but they're not typesafe, and while that's great sometimes, it's less great other times. A simple varargs syntax that coerces all the arguments into a particular type (presumably using your implicit user-defined typecasts from above) would be easy and highly useful.

  11. Lots of other things. This is not a complete list of features you should add to your language. Go crazy! Language design is an act of creativity, and most language features will not make your language unacceptable as a C replacement. Just don't break any of the "must not do" rules up above.

Current C and C++ alternatives and why they aren't popular

Apple/NeXT have been single-handledly pushing Objective C since, I don't know, maybe the 1980's or at least the 90's. It makes none of the "must not do" errors (since its dynamically-typed objects are optional). I personally suspect the reasons for its slow adoption are simple: a) Objective C isn't enough better than C and adds nothing if you don't use its dynamic typing; and b) the syntax is infernally ugly. Think about this: for all we know, the Linux kernel is actually written using every feature in the kernel-compatible subset of Objective C. Basically it neither wins nor loses. Mu.

The D language started out as a good idea, but they went crazy in version 2. Also, they require garbage collection, so they're instantly disqualified.

Google Go has tons of great stuff inside and meets almost all of the above requirements. Unfortunately it is also garbage collected, so it's instantly disqualified. (This one hurts me to the bottom of my soul, because the other stuff looks so great. But I'm not disqualifying it because I'm subjective, I'm disqualifying it because it just won't do the job as long as it requires garbage collection.)

C# is a rather nice language overall and, in fact, has very little in it that prevents it from being natively compiled. (Mono actually has a way to compile it natively nowadays, called their "AOT" (ahead of time) compiler.) However, it requires a big huge gunky runtime and garbage collection and at least one system thread and it parses XML at startup time - strace it and see! - so no luck. (I left XML out of the "must not do" list because I thought it was obvious. Don't make me regret it.)

Java actually fails at every single point in this article. Okay, not really. But they did manage to botch most of it in rather spectacular ways.

Any others that I've missed?

Note that C++ meets all the above requirements. That's why it was able to replace C for so many things. The main reason C++ doesn't replace C for a bunch of other things is that it's just too crazy and it encourages you, as the developer, to also be crazy. See my previous rant for all about that.

P.S. No, I am not planning to make my own C replacement language. When python isn't appropriate, I will continue using and complaining about C++, while desperately attempting to use it tastefully, if that is even possible. I will, however, switch to your language if it meets all my requirements. So you'll have at least one user.

Update 2010/07/21: Wow, this hit the front page of news.ycombinator.com in less than 10 minutes. Thanks, guys. But I see there is some confusion about where I stand on C vs. C++ specifically, and why C++ is not the answer if my question is how to replace C. Good question!

The problem with C is that it works but is missing stuff; the problem with C++ is that it tried to add stuff, but the result is hideous. That's a totally subjective evaluation of C++ (see my previous rant for some concrete examples) but it's one that a lot of people seem to agree with. The goal here is to identify the "necessary but not sufficient" rules for creating a C replacement that has a chance of winning. You may hate C++, but it met those criteria, and so it became massively popular, hideous or not. I just want more options; please make me a language that is necessary, sufficient, and not hideous.

Syndicated 2010-07-22 00:36:40 (Updated 2010-07-22 02:04:26) from apenwarr - Business is Programming

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!