Older blog entries for apenwarr (starting at number 557)

22 Jul 2010 (updated 22 Jul 2010 at 02:04 UTC) »

How to design a replacement for C++

My last article on the ugliness that is C++ didn't actually receive this complaint, but it should have: I offered a lot of criticism, but no constructive criticism.

I feel a little guilty about it, so let me try to resolve that here with some actual, constructive advice to language designers, for anyone who cares to listen. (Maybe nobody cares to listen, and in fact this will be much less interesting than the blind ranting of my last article. Too bad. Stop reading now if you're bored.)

The first thing you need to know about C/C++ is that they're only barely worth fixing anyway.

C has too few features, and C++ has far too many awful ones. Reasonable people might disagree on which features C is missing and which C++ should lose. But most people would agree at least that C could be usefully extended, and C++ could be usefully simplified (and maybe have a few cleanups, like my earlier suggestions of operator[]=, a sensible method pointer, and sensible standard strings).

We also know that neither change will happen. The C people, having seen what happens when you extend your language willy-nilly (ie. C++) are deathly afraid of it and will never ever change again. The C++ people are well set on their path (ie. ultimate salvation is right around the corner if we can just add a little more crack to our templates) and will never let it go.

But anyway, that doesn't really matter. C and C++ both get the job done in their respective niches. And those niches are shrinking dramatically. Once upon a time, you'd surely write all your apps in C or C++; nowadays, almost everything is better off written in a language with more built-in stuff. My personal tool of choice nowadays (when appropriate; I'll get to that in a minute) is python for most stuff, with C modules added on for the parts that have to be fast. It works excellently, as judged by my favourite metrics of fewer lines of code, increased readability, and maximum performance.

You might prefer ruby or C# or something intead of python. That's fine, although python seems to be the winner so far when it comes to a super-easy and efficient C extension system. (C#, including mono, makes me especially angry because C extensions often run slower than native C#. There's a massive and stupid overhead required to escape from the runtime down into native space and it often outweighs the speed gained from C. Duh. In python the overhead of calling into a C module is essentially zero.)

To a large extent, the reason you can get away with using "higher level" languages like python or ruby or C# is that computers have gotten faster and have a lot more memory than they used to. You need the faster computer to run an interpreted language, and you need more memory because you have garbage collection instead of manual memory management. But we've got the horsepower now. Might as well use it.

That means C and C++ are on the decline and they're just going to get smaller. Good. The world will be a better place for it.

But there will always be programs that have to be written in a language like C and C++. That includes kernels, drivers, highly performance-sensitive code like game engines, virtual machines, some kinds of networking code, and so on. And for me in particular, it also includes new plugins to existing C-based legacy systems, including Microsoft Office.

These programs are never going to go away. So deciding that they will, forever, have to suffer with the limitations of either C or C++ is kind of disappointing. And yet there is still no language - not even the hint of a beginning of a language - that can seriously claim to replace them. Here are the key "features" you will absolutely need to avoid if you want any chance at replacing C.

Things you absolutely must not do if you want to replace C

  1. Do not remove the ability to directly call into (and be called by) C and ASM without any wrapper/translation layers. When I want to call printf() from C or C++, I #include stdio.h and move on with my life. No other language makes it that easy. None. Zero. Do not be those other languages.

  2. Do not remove the cpp preprocessor. Look, I realize you are morally opposed to preprocessors. Well fuck you. If you take it out, I can't #include stdio.h, and I can't implement awesome assert-like macros. End of discussion.

  3. Avoid garbage collection. Garbage collection is fine as a concept, but you will never, ever, be able to write a good kernel if you try to use garbage collection. This is non-negotiable. Also, plugins to existing C programs won't fly with garbage collection, because you won't be able to usefully mark-and-sweep through the majority of non-garbage-collected memory, and you can't safely pass gc'd vs. non-gc'd memory back and forth between C and your language. Maybe your language can have optional garbage collection, but optional has to mean globally disabled across the entire executable.

  4. Avoid mandatory "system" threads. If you're writing a kernel, you're the guy implementing the threading system, so if your language requires threads, you're instantly dead in the water. Garbage collection often uses a separate mark-and-sweep thread, which is another reason gc just isn't an option. But it's even more insidious than that: what happens when you fork() a program that has threads? Do you even know? If the threads were created by the runtime, will it be sane even 1% of the time? You can't invent Unix if you can't fork().

  5. Avoid a mandatory standard library. People can - and do - compile entire C programs without using any standard library functions at all. Think about a kernel, for example. Even memory allocation is undefined until the kernel defines it. Most modern languages are integrated with their standard library - ie. some syntax secretly calls into functions - and this destroys their suitability for some jobs that C can do.

  6. Avoid dynamic typing. Dynamic typing always requires some sort of dictionary lookups and is, at best, slightly slower than static typing. To replace C in the cases where it refuses to die, you can't have a language that's almost as fast as C. It has to be as fast as C. Period. Google Go has some great innovations here with its static duck typing. Objective C is okay here because the dynamic typing is optional.

  7. Avoid support for exception handling. It's just too complicated, and moreover, C people just hate exceptions so they will hate you, too. And since C doesn't know about exceptions, you will make a mess when C calls you, then you throw (but don't catch) an exception. Just leave it out.

  8. Do not make it harder to do things in your language than they would be in C. Maybe this isn't even worth mentioning. But the upper bound on the lines of code it takes to do something should be the equivalent in C. Making your language backward-compatible with C is one way (not the only way) to achieve this.

All this sounds terrible, right? Why even bother if you can't have these obvious features? But actually, there are a bunch of things you can add and make things much, much better than C without making your language unacceptable in C's niche.

Things you can add to your language to make it better than C without ruining your chances to replace it

  1. Deterministic constructors/destructors (RAII). This is, quite probably, my favourite feature of C++ and the primary thing that makes me hate going back to C. (The lack of it is also what makes me hate almost every other high-level language. Python, thankfully, has this, although they claim that it's an implementation detail that could go away at any time. And IronPython can't do it. Bastards.) Deterministic constructors and destructors make smart pointers and automatic refcounting possible (and delightful!) and let you write things in one line of C++ that would take 10 lines of C. No exaggeration. And it compiles down to the same thing that C would, so there's no runtime cost.

  2. Closures and anonymous functions. In fact, Apple has already added these in an incompatible variant of C. Maybe you like them, maybe you don't, maybe you think they're God's gift to programming and any language without them is an infidel. But adding them would be harmless, anyway. (Update 2010/07/21: I mean harmless in that it wouldn't bloat the compiled code; it compiles down to the same ASM as the equivalent verbose C code, and if you don't use it, you don't pay for it.)

  3. Implicit user-defined typecasts. These are a tricky feature of C++ and some C people hate them because they hide stuff they think should be explicit. But you need this if you want to implement non-gross smart pointers and user-defined string objects.

  4. Operator overloading. You have to be seriously tasteful about this one. If you don't think you can handle the pressure, leave it out. But in the name of God, at least make operator== do something sane by default.

  5. Automatic vtable generation. It doesn't have to be full-on OOP, and you don't need multiple inheritence and any of that stuff. But a huge number of lines in C programs are taken up declaring things that are basically vtables. Make it better. Google Go has some great ideas here. This one feature is probably the only good thing about Objective C.

  6. Some sort of generics so you can make type-safe containers. Note, I'm not saying templates here. C++ has made templates a dirty word; you want to copy precisely none of their template stuff. But C# (up to, but not including, C# 4.0) has some very nice (and highly optimizable in native code) generics ideas that you can steal. Also note: I'm not saying generics are necessary in a language that replaces C. C doesn't have them and it survives. Most attempts at a C replacement leave this out of version 1 and add it to version 2, and that's perfectly okay.

  7. One-time declaration/definition of functions. In C or C++, you have to declare your stuff in a header file, then define it in an implementation file. Your header file then gets compiled over and over again by everyone who uses your functions. (In C++ it gets even worse: your templates have to be defined in the header, so compiling every file ends up compiling half of your bloody program.) This is awful, and is the primary reason compiling C and C++ is slow. The problem has also been completely solved since the 1990's. Check out Turbo Pascal sometime. C# and Java, for all their flaws, have also thoroughly solved this. (Update 2010/07/21: Just because you absolutely must not remove the preprocessor doesn't mean you have to use it for declaring functions. The preprocessor is valuable for macros, not for function declarations.)

  8. Standardized string handling. Actually I don't think this is very important; much more important is the ability to keep letting people define their own string types. As I mentioned in my previous article, I disagree with the conventional wisdom that allowing user-defined string types was a major mistake of C++. Strings are often the slowest part of your program. Making them possible to optimize or replace is a good idea; adding some sugar to construct compile-time string literals directly in a user-defined data type would be even better. However, even so, having a decent default string type couldn't possibly make things worse (as long as you can ignore it when it gets in the way, ie. in a kernel).

  9. Implicit pass-by-reference. I'm totally addicted to the way python passes objects by reference, and only by reference. (Pedants would say it actually "passes references by value." I know the difference. I don't care.) This is probably hard to pull off without garbage collection support, but if you can do it, you'll be my hero. At the very least, let us use reference syntax wherever we might normally use pointer syntax, because requiring us to manually dereference pointers all the time was a mistake. And once you've done that, maybe remove pointer syntax altogether, because it's kind of redundant in C++ to have both. (The only exception is that in C++, you can't reassign what a reference points to. But that's only because they're idiots. Just let me do that, and pointer syntax is entirely obsolete.)

  10. Typesafe varargs. C++ totally failed at this, with utterly awful results (ie. lots and lots of templates that define every version of a function with 1 to n parameters). C varargs are great, but they're not typesafe, and while that's great sometimes, it's less great other times. A simple varargs syntax that coerces all the arguments into a particular type (presumably using your implicit user-defined typecasts from above) would be easy and highly useful.

  11. Lots of other things. This is not a complete list of features you should add to your language. Go crazy! Language design is an act of creativity, and most language features will not make your language unacceptable as a C replacement. Just don't break any of the "must not do" rules up above.

Current C and C++ alternatives and why they aren't popular

Apple/NeXT have been single-handledly pushing Objective C since, I don't know, maybe the 1980's or at least the 90's. It makes none of the "must not do" errors (since its dynamically-typed objects are optional). I personally suspect the reasons for its slow adoption are simple: a) Objective C isn't enough better than C and adds nothing if you don't use its dynamic typing; and b) the syntax is infernally ugly. Think about this: for all we know, the Linux kernel is actually written using every feature in the kernel-compatible subset of Objective C. Basically it neither wins nor loses. Mu.

The D language started out as a good idea, but they went crazy in version 2. Also, they require garbage collection, so they're instantly disqualified.

Google Go has tons of great stuff inside and meets almost all of the above requirements. Unfortunately it is also garbage collected, so it's instantly disqualified. (This one hurts me to the bottom of my soul, because the other stuff looks so great. But I'm not disqualifying it because I'm subjective, I'm disqualifying it because it just won't do the job as long as it requires garbage collection.)

C# is a rather nice language overall and, in fact, has very little in it that prevents it from being natively compiled. (Mono actually has a way to compile it natively nowadays, called their "AOT" (ahead of time) compiler.) However, it requires a big huge gunky runtime and garbage collection and at least one system thread and it parses XML at startup time - strace it and see! - so no luck. (I left XML out of the "must not do" list because I thought it was obvious. Don't make me regret it.)

Java actually fails at every single point in this article. Okay, not really. But they did manage to botch most of it in rather spectacular ways.

Any others that I've missed?

Note that C++ meets all the above requirements. That's why it was able to replace C for so many things. The main reason C++ doesn't replace C for a bunch of other things is that it's just too crazy and it encourages you, as the developer, to also be crazy. See my previous rant for all about that.

P.S. No, I am not planning to make my own C replacement language. When python isn't appropriate, I will continue using and complaining about C++, while desperately attempting to use it tastefully, if that is even possible. I will, however, switch to your language if it meets all my requirements. So you'll have at least one user.

Update 2010/07/21: Wow, this hit the front page of news.ycombinator.com in less than 10 minutes. Thanks, guys. But I see there is some confusion about where I stand on C vs. C++ specifically, and why C++ is not the answer if my question is how to replace C. Good question!

The problem with C is that it works but is missing stuff; the problem with C++ is that it tried to add stuff, but the result is hideous. That's a totally subjective evaluation of C++ (see my previous rant for some concrete examples) but it's one that a lot of people seem to agree with. The goal here is to identify the "necessary but not sufficient" rules for creating a C replacement that has a chance of winning. You may hate C++, but it met those criteria, and so it became massively popular, hideous or not. I just want more options; please make me a language that is necessary, sufficient, and not hideous.

Syndicated 2010-07-22 00:36:40 (Updated 2010-07-22 02:04:26) from apenwarr - Business is Programming

How to design a replacement for C++

My last article on the ugliness that is C++ didn't actually receive this complaint, but it should have: I offered a lot of criticism, but no constructive criticism.

I feel a little guilty about it, so let me try to resolve that here with some actual, constructive advice to language designers, for anyone who cares to listen. (Maybe nobody cares to listen, and in fact this will be much less interesting than the blind ranting of my last article. Too bad. Stop reading now if you're bored.)

The first thing you need to know about C/C++ is that they're only barely worth fixing anyway.

C has too few features, and C++ has far too many awful ones. Reasonable people might disagree on which features C is missing and which C++ should lose. But most people would agree at least that C could be usefully extended, and C++ could be usefully simplified (and maybe have a few cleanups, like my earlier suggestions of operator[]=, a sensible method pointer, and sensible standard strings).

We also know that neither change will happen. The C people, having seen what happens when you extend your language willy-nilly (ie. C++) are deathly afraid of it and will never ever change again. The C++ people are well set on their path (ie. ultimate salvation is right around the corner if we can just add a little more crack to our templates) and will never let it go.

But anyway, that doesn't really matter. C and C++ both get the job done in their respective niches. And those niches are shrinking dramatically. Once upon a time, you'd surely write all your apps in C or C++; nowadays, almost everything is better off written in a language with more built-in stuff. My personal tool of choice nowadays (when appropriate; I'll get to that in a minute) is python for most stuff, with C modules added on for the parts that have to be fast. It works excellently, as judged by my favourite metrics of fewer lines of code, increased readability, and maximum performance.

You might prefer ruby or C# or something intead of python. That's fine, although python seems to be the winner so far when it comes to a super-easy and efficient C extension system. (C#, including mono, makes me especially angry because C extensions often run slower than native C#. There's a massive and stupid overhead required to escape from the runtime down into native space and it often outweighs the speed gained from C. Duh. In python the overhead of calling into a C module is essentially zero.)

To a large extent, the reason you can get away with using "higher level" languages like python or ruby or C# is that computers have gotten faster and have a lot more memory than they used to. You need the faster computer to run an interpreted language, and you need more memory because you have garbage collection instead of manual memory management. But we've got the horsepower now. Might as well use it.

That means C and C++ are on the decline and they're just going to get smaller. Good. The world will be a better place for it.

But there will always be programs that have to be written in a language like C and C++. That includes kernels, drivers, highly performance-sensitive code like game engines, virtual machines, some kinds of networking code, and so on. And for me in particular, it also includes new plugins to existing C-based legacy systems, including Microsoft Office.

These programs are never going to go away. So deciding that they will, forever, have to suffer with the limitations of either C or C++ is kind of disappointing. And yet there is still no language - not even the hint of a beginning of a language - that can seriously claim to replace them. Here are the key "features" you will absolutely need to avoid if you want any chance at replacing C.

Things you absolutely must not do if you want to replace C

  1. Do not remove the ability to directly call into (and be called by) C and ASM without any wrapper/translation layers. When I want to call printf() from C or C++, I #include stdio.h and move on with my life. No other language makes it that easy. None. Zero. Do not be those other languages.

  2. Do not remove the cpp preprocessor. Look, I realize you are morally opposed to preprocessors. Well fuck you. If you take it out, no C programmer will switch to your language. End of discussion.

  3. Avoid garbage collection. Garbage collection is fine as a concept, but you will never, ever, be able to write a good kernel if you try to use garbage collection. This is non-negotiable. Also, plugins to existing C programs won't fly with garbage collection, because you won't be able to usefully mark-and-sweep through the majority of non-garbage-collected memory, and you can't safely pass gc'd vs. non-gc'd memory back and forth between C and your language. Maybe your language can have optional garbage collection, but optional has to mean globally disabled across the entire executable.

  4. Avoid mandatory "system" threads. If you're writing a kernel, you're the guy implementing the threading system, so if your language requires threads, you're instantly dead in the water. Garbage collection often uses a separate mark-and-sweep thread, which is another reason gc just isn't an option. But it's even more insidious than that: what happens when you fork() a program that has threads? Do you even know? If the threads were created by the runtime, will it be sane even 1% of the time? You can't invent Unix if you can't fork().

  5. Avoid a mandatory standard library. People can - and do - compile entire C programs without using any standard library functions at all. Think about a kernel, for example. Even memory allocation is undefined until the kernel defines it. Most modern languages are integrated with their standard library - ie. some syntax secretly calls into functions - and this destroys their suitability for some jobs that C can do.

  6. Avoid dynamic typing. Dynamic typing always requires some sort of dictionary lookups and is, at best, slightly slower than static typing. To replace C in the cases where it refuses to die, you can't have a language that's almost as fast as C. It has to be as fast as C. Period. Google Go has some great innovations here with its static duck typing. Objective C is okay here because the dynamic typing is optional.

  7. Avoid support for exception handling. It's just too complicated, and moreover, C people just hate exceptions so they will hate you, too. And since C doesn't know about exceptions, you will make a mess when C calls you, then you throw (but don't catch) an exception. Just leave it out.

All this sounds terrible, right? Why even bother if you can't have these obvious features? But actually, there are a bunch of things you can add and make things much, much better than C without making your language unacceptable in C's niche.

Things you can add to your language to make it better than C without ruining your chances to replace it

  1. Deterministic constructors/destructors (RAII). This is, quite probably, my favourite feature of C++ and the primary thing that makes me hate going back to C. (The lack of it is also what makes me hate almost every other high-level language. Python, thankfully, has this, although they claim that it's an implementation detail that could go away at any time. And IronPython can't do it. Bastards.) Deterministic constructors and destructors make smart pointers and automatic refcounting possible (and delightful!) and let you write things in one line of C++ that would take 10 lines of C. No exaggeration. And it compiles down to the same thing that C would, so there's no runtime cost.

  2. Closures and anonymous functions. In fact, Apple has already added these in an incompatible variant of C. Maybe you like them, maybe you don't, maybe you think they're God's gift to programming and any language without them is an infidel. But adding them would be harmless, anyway.

  3. Implicit user-defined typecasts. These are a tricky feature of C++ and some C people hate them because they hide stuff they think should be explicit. But you need this if you want to implement non-gross smart pointers and user-defined string objects.

  4. Operator overloading. You have to be seriously tasteful about this one. If you don't think you can handle the pressure, leave it out. But in the name of God, at least make operator== do something sane by default.

  5. Automatic vtable generation. It doesn't have to be full-on OOP, and you don't need multiple inheritence and any of that stuff. But a huge number of lines in C programs are taken up declaring things that are basically vtables. Make it better. Google Go has some great ideas here. This one feature is probably the only good thing about Objective C.

  6. Some sort of generics so you can make type-safe containers. Note, I'm not saying templates here. C++ has made templates a dirty word; you want to copy precisely none of their template stuff. But C# (up to, but not including, C# 4.0) has some very nice (and highly optimizable in native code) generics ideas that you can steal. Also note: I'm not saying generics are necessary in a language that replaces C. C doesn't have them and it survives. Most attempts at a C replacement leave this out of version 1 and add it to version 2, and that's perfectly okay.

  7. One-time declaration/definition of functions. In C or C++, you have to declare your stuff in a header file, then define it in an implementation file. Your header file then gets compiled over and over again by everyone who uses your functions. (In C++ it gets even worse: your templates have to be defined in the header, so compiling every file ends up compiling half of your bloody program.) This is awful, and is the primary reason compiling C and C++ is slow. The problem has also been completely solved since the 1990's. Check out Turbo Pascal sometime. C# and Java, for all their flaws, have also thoroughly solved this. Just because you absolutely must not remove the preprocessor doesn't mean you have to use it for declaring functions.

  8. Standardized string handling. Actually I don't think this is very important; much more important is the ability to keep letting people define their own string types. As I mentioned in my previous article, I disagree with the conventional wisdom that allowing user-defined string types was a major mistake of C++. Strings are often the slowest part of your program. Making them possible to optimize or replace is a good idea; adding some sugar to construct compile-time string literals directly in a user-defined data type would be even better. However, even so, having a decent default string type couldn't possibly make things worse (as long as you can ignore it when it gets in the way, ie. in a kernel).

Current C and C++ alternatives and why they aren't popular

Apple/NeXT have been single-handledly pushing Objective C since, I don't know, maybe the 1980's or at least the 90's. It makes none of the "must not do" errors (since its dynamically-typed objects are optional). I personally suspect the reasons for its slow adoption are simple: a) Objective C isn't enough better than C; and b) the syntax is infernally ugly.

The D language started out as a good idea, but they went crazy in version 2. Also, they have garbage collection, so they're instantly disqualified.

Google Go has tons of great stuff inside and meets almost all of the above requirements. Unfortunately it is also garbage collected, so it's instantly disqualified.

C# is a rather nice language overall and, in fact, has very little in it that prevents it from being natively compiled. (Mono actually has a way to compile it natively nowadays, called their "AOT" (ahead of time) compiler.) However, it requires a big huge runtime and garbage collection, so no luck.

Objective C avoids the "must not do" requirements, except for dynamic typing and a standard library, but those are optional. Unfortunately its syntax is hideous and it adds almost nothing over C if you don't use the dynamic typing, so whether your kernel is written in C or Objective C is really a "mu" sort of question.

Any others that I've missed?

Note that C++ meets all the above requirements. That's why it was able to replace C for so many things. The main reason C++ doesn't replace C for a bunch of other things is that it's just too crazy and it encourages you, as the developer, to also be crazy. See my previous rant for all about that.

P.S. No, I am not planning to make my own C replacement language. When python isn't appropriate, I will continue using and complaining about C++, while desperately attempting to use it tastefully, if that is even possible.

Syndicated 2010-07-22 00:03:12 from apenwarr - Business is Programming

20 Jul 2010 (updated 21 Sep 2010 at 19:06 UTC) »

You can't make C++ not ugly, but you can't not try

...everything that's wrong with C++ comes down to that.

Background: I've been programming in C++ since about 1993; that's 17 years now. As late as 2009, I chose C++ to write our Windows client for EQL Data. If I were to make that decision today, I would still choose C++, because, quite simply, nothing else would work. (Okay, C would work, but it would be at least 5x as much effort. So no thanks. And for making plugins to legacy Windows apps, there's just nothing else out there.)

So, okay, I know a fair bit about C++. I've managed 30-person development teams building huge stuff in C++. Successfully. I have some context here.

And my context is: even if there's nothing better for the job, the truth is that C++ is incredibly ugly and misdesigned. C++ is a trap: they tell you that you can do anything you want in C++. Anything! C++ isn't a language, they say, it's a language construction kit! Build the language of your dreams in C++! And it'll be portable and scalable and fast and standardized!

And this is so close to true that even after using it for 17 years, I still almost believe it. I used to actually believe it. But see, some recent experience with the "amazing innovations" in other programming languages have convinced me otherwise.

First of all, if you haven't done much C++, you need to realize: most of the stuff in there is utter putrid boneheaded crap. This includes the RTTI and exceptions stuff; C++'s versions of those were enough to convince a whole generation of programmers that introspection and exceptions were outright evil and should be avoided. But as it turns out, that's only true in C++.

If you've heard anything about C++, you've probably heard that there's no standard string class and everybody rolls their own. That's not actually one of the bad things, in my opinion. As a person who's done a lot of coding in C++, I've actually come to understand that there really are good reasons to use different string objects at different times. In python, my current language of choice whenever it's appropriate, there's only one string class, and it's mostly okay, but every now and then you really want to just replace one character in the string with one other character (an O(1) operation) but you can't, so you instead construct a new string (an O(n) operation) and your program is vastly slower and less scalable. (Happily, python makes it pretty easy to create string-like objects, particularly using C extensions, so if you really need it, you can make it still go fast. But effectively thats just exactly creating your own string class, like so many people do in C++. See? Not always evil.)

The problem isn't interchangeable string classes. The problem is that the default C++ string class is so utterly godawfully stupid. No garbage collection? Check. No refcounting? Check. Need to allocate/free heap space just to pass a string constant to a function? Check. No support for null strings? Check. Horrendous mess of templates that makes tracing in a debugger utterly painful? Check. Horrendous mess of templates that makes non-ultramodern compilers unable to optimize them so that, for years, your toy homemade string class was 5x faster? Check. Totally unclear what character type it uses (actually you can use whatever you want at different times)? Check. Totally missing a sprintf-like formatter so you have to use something, anything, oh god please save me from iostreams just to produce a dynamic string? Check. Can't append to a string without allocating a whole new one? Check. Using the "+" append operator produces more temporary objects than you can count? Check. Using the "+" operator with two string constants gives a weird compiler error about adding pointers? Check.

In contrast, let's take, say, python's strings: refcounted, passed by reference, nullable, compatible with string constants, no templates, trivially easy debugging, always the same character type (although they changed it in python 3, sigh), include a sprintf-like operator, the + append operator works fine and multiple appended constants can be optimized at compile time (python's interpreter compiles to a metalanguage and can do basic optimizations like this). They even have an optimized non-constant append operator in newer versions of python that's more efficient than making a whole new copy every time.

How many of these string features required us to use an interpreted language? Precisely zero. An imaginary, fictional version of C++ could have had a string class with all these features and been just as fast and efficient. And I bet a lot fewer people would have written their own if that had been the case. There's actually no excuse for the crap that is C++ std::strings; they aren't better. They're just, somehow, the standard.

Another C++ problem that's close to my heart is function pointers. Not even lambdas or anonymous functions - let's not get all fancy, here. Just plain pointers to existing named functions. C++, being a superset of C, has function pointers, of course. And while the syntax for them has always been a little funny, they actually work fine and don't make you want to kill people too often. (Everywhere C function pointers are used they should always have a void *userdata parameter, and when people don't do that (like in qsort()), then you do want to kill things... but that's not C's fault, and sensible programmers can avoid that mistake.) So ok. C++ has function pointers.

But here's the thing: they utterly failed to extend this concept to include pointers to methods of an object.

Okay, that's not really true. In fact, it's a little-known fact that C++ - the language, not the insane libraries or templates - has built-in support for function pointers that call member functions.

The bad news is, this feature is so horrendously ill-conceived that absolutely nobody uses it for anything. Seriously. Nobody. I tried my best. The feature really is actually useless. The article I linked to tried desperately to make them look like maybe they have a purpose, but no. They just don't. (You can see the main problem in the linked article under the section "Member Function Pointers Are Not Just Simple Addresses." You might think, oh, of course not. They're a "this" pointer plus an address, right? Ha ha! Ha ha ha ha!! No they're not! They don't have a this pointer! You still have to provide your own this pointer when you call it! But it does store all kinds of crazy other stuff instead so it can do call-time vtable lookups on multiply-inherited objects! Ha ha!)

Utterly useless. But the bad thing isn't so much that it's useless - although maybe someone should have noticed that and killed the feature before it somehow passed the standards committee. The bad thing is that there is an obvious way to do it that wouldn't have been useless: just make a member function pointer be a struct { obj, funcaddress }. Everybody knows that calling a member function obj.f(x,y,z) in C++ is actually done by calling f(obj,x,y,z). There would be nothing to it. Since you know 'this' at the time you create the function pointer, you can resolve the funcaddress from the name 'f' at that point - the same way you would when making any method call, including vtables, multiple inheritance, and everything - and the code receiving the pointer would always just run it as (*funcaddress)(obj, ...). So easy. Nothing to it. So very much terrible C++ code would never have been written if this feature existed.

But it doesn't. There are alternatives, of course - numerous ones, and all terrible, and all incompatible, because the language designers simply failed utterly to do their job. The boost (now TR1) one has the cutest syntax, but God help you if you make a typo using it, because you'll get pages of template gibberish.

Stop and think about that for a second. Template gibberish. For a simple function pointer! Every language not designed by idiots in the last 20 years, including Turbo Pascal, has some kind of function pointers. ASM has function pointers. C has function pointers. This isn't hard. It has nothing to do with making fancy type-independent efficient data structures, for which templates/generics are actually justified. It has to do with a trivial operation that's a basic part of every compiled language: pushing some parameters on the stack and jumping to an address.

While I'm here, no, strings are not "generic" data structures either. The fact that std::string is a template is also incredibly insulting.

Okay, one more example of C++ terribleness. This one is actually a tricky one, so I can almost forgive the C++ guys for not thinking up the "right" solution. But it came up again for me the other day, so I'll rant about it too: dictionary item assignment.

What happens when you have, say, a std::map of std::string and you do m[5] = "chicken"? Moreover, what happens if there is no m[5] and you do std::string x = m[5]?

Answer: m[5] "autovivifies" a new, empty string and stores it in location 5. Then it returns a reference to that location, which in the first example, you reassign using std::string::operator=. In the second example, the autovivified string is copied to x - and left happily floating around, empty, in m[5].

Ha ha! In what universe are these semantics reasonable? In what rational set of rules does the right-hand-side of an assignment statement get modified by default? Maybe I'm crazy - no, that's not it - but when I write m[5] and there's no m[5], I think there are only two things that are okay to happen. Either m[5] returns NULL (a passive indicator that there is no m[5], like you'd expect from C) or m[5] throws an exception (an aggressive indicator that there is no m[5], like you'd see in python).

Ah, you say. But look! If that happened, then the first statement - the one assigning to m[5] - wouldn't work! It would crash because you end up assigning to NULL!

Yes. Yes it would. In C++ it would, because the people who designed C++ are idiots.

But in python, it works perfectly (even for user-defined types). How? Simple. Python's parser has a little hack in it - which I'm sure must hurt the python people a lot, so much do they hate hacks - that makes m[5]= parse differently than just plain m[5].

The python parser converts o[x]=y directly into o.__setitem__(x,y). Whereas o[x] without a trailing equal sign converts directly into o.__getitem__(x). It's very sad that the parser has to do such utterly different things with two identical-looking uses of the square bracket operator. But the result is you get what you expect: __getitem__ throws an exception if there's no m[5]. __setitem__ doesn't. __setitem__ puts stuff into your object; it doesn't waste time pulling stuff out of your object (unless that's a necessary internal detail for your data structure implementation).

But even that isn't the worst thing. Here's what's worse: C++'s crazy autovivification stuff makes it slower, because you have to construct an object just so you can throw it away and reassign it. Ha ha! The crazy language where supposedly performance is all-important actually assigns to maps slower than python can! All in the name of having language purity, so we don't have to have stupid parser hacks to make [] behave two different ways!

...

"...Well," said the C++ people. "Well. We can't have that."

So here's what they invented. Instead of inventing a sensible new []= operator, they went even more crazy. They redefined things such that, if your optimizer is sufficiently smart, it can make all the extra crap go away.

There's something in C++ called the "return value optimization." Normally, if you do something like "MyObj x = f()", and f returns a MyObj, then what would need to happen is that 'x' gets constructed using the default constructor, then f() constructs a new object and returns it, and then we call x.operator= to copy the object from f()'s return value, then we destroy f()'s return value.

As you might imagine, when implementing the [] setter on a map, this would be kind of inefficient.

But because the C++ people so desperately wanted this sort of thing to be fast, they allowed the compiler to optimize out the creation of x and the copy operation; instead, they just tell f() to construct its return value right into x. If you think about it hard enough, you can see that, assuming the stars all align perfectly, m[5] = "foo" can benefit from this operation. Probably only if m.operator[] is inlined, but of course it is - it's a template! Everything in a template is inlined! Ha ha!

So actually C++ maps are as fast as python maps, assuming your compiler writers are amazingly great, and a) implement the (optional) return-value optimization; b) inline the right stuff; and c) don't screw up their overcomplicated optimizer so that it makes your code randomly not work in other places.

Okay, cool, right? Isn't this a triumph of engineering - an amazingly world class optimizer plus an amazingly supercomplex specification that allows just the right combination of craziness to get what you want?

NO!

No it is not!

It is an absolute failure of engineering! Do you want to know what real engineering is? It's this:

map_set(m, 5, "foo");
char *x = map_get(m, 5);

That plain C code runs exactly as fast as the above hyperoptimized ultracomplex C++. *And* it returns NULL when m[5] doesn't exist, which C++ fails to do.

In the heat of the moment, it's easy to lose sight of just how much of C++ is absolutely senseless wankery.

And this, my friends, is the problem.

As with any bureaucracy, the focus slowly shifts from finding a simple, elegant way to solve your problem to just goddamn winning this one battle with the system so that you can get the bloody thing working at all. It would have been easy, at any time, for the C++ committee to have just added a new operator[]=. It would have been totally backward-compatible: any object without an operator[]= would keep working just like it always has.

But they couldn't do that. Doing that would be admitting defeat.

They could have made up a new syntax for sensible member function pointers, any time they wanted. Again, no concern about backwards compatibility - if you don't use it, it doesn't affect you.

They could have written a sensible string class. In fact, people did. Lots of people! But for some reason, they standardized on the non-sensible one. Now C++ users are forever cursed: either you use std::string, and pay endlessly for its suck, or you use your own string class, and be one of those people who constantly gets criticized for designing their own string class.

It is possible to write C++ that's not crap - in theory. This is because it's possible to write C that's not crap, and C programs will compile as C++. Then, you can add a sprinkle of the non-sucky parts of C++ - deterministic construction/destruction (RAII) is one of them - and you'll have a program that's undoubtedly better, more readable, and easier to debug than it would have been in pure C.

But you can't stop there. You should, but you can't. Nobody can. It would be superhuman. Because you'll see something that should be a little clearer, a little easier. Maybe it's string concatenation, maybe it's member function pointers, maybe it's operator[]. But you'll see it, and you'll start trying to solve it. And 1000 lines of code later, you'll have made your life - and the lives of everyone who has to maintain your programs - much worse.

For me it was function pointers. Over the years in wvstreams, I tried doing them so many different ways - using C-style function pointers with wrapper functions, using inheritance and virtual functions, using the insane C++ member function pointers, using templates and the insane C++ member function pointers. Finally, nowadays, function pointers in WvStreams use boost's new functor stuff, which has been standardized by TR1. And every single time I use one, I have to look up the syntax.

For my own library that I've spent the last 12 years building. I have to look up the syntax to declare a callback.

I should have just stuck with plain C function pointers.

Let this be a warning to you.

Syndicated 2010-07-20 03:56:30 (Updated 2010-09-21 19:06:01) from apenwarr - Business is Programming

22 May 2010 (updated 23 May 2010 at 20:08 UTC) »

A Programmer's Code of Ethics

  1. My programs encode the rules of modern society. I will take full responsibility for the programs I write.

  2. I will not write a program that intentionally fails to operate.

  3. I will not write a program that refuses to do tomorrow what it was able to do yesterday.

  4. I will not create a single point of failure, whether technical or political.

  5. I will not encode foolish rules just because someone paid me to do it.

  6. I will not give people what they want if what they want is not good enough.

  7. I will not stop people from taking my program's ideas and making them better.

  8. I will write programs to help each person produce their best, not to help the masses produce mediocrity.

  9. I will correct those who believe my program's failure is anyone's fault but mine.

  10. I will write programs to benefit even the people who don't deserve it.

 

Condensed "New Testament" Version

    Don't write for others a program you wouldn't want written for you.

 

Commentary

Do I always follow all the above rules perfectly? Certainly not. In fact, I think I've broken every single one of them.

But thinking over all those situations and knowing what I know now, I'm pretty sure that in every case, it would have been better if I'd done the right thing. The exceptions don't feel like the right move; they just feel dirty.

That's how I know I'm on the right track.

Update 2010/05/22: Based on a suggestion from Chris Frey, slightly rephrased point #3.

Syndicated 2010-05-20 22:16:49 (Updated 2010-05-23 20:08:41) from apenwarr - Business is Programming

At last, the circle is complete

My twitter search RSS feed (yes, I have one, so shoot me) for "apenwarr" returned a hit today in which the only usage of the word "apenwarr" was an URL hidden behind a bit.ly link.

Oh yes. That means twitter search is now decoding bit.ly URLs as part of the indexing process, but of course it *still* serves you the original stupid bit.ly links.

Thank you, oh great technology gods, for inventing new uses for excess CPU that I never could have imagined.

In other news, SEOs can now increase the keyword relevance of their twitter links; just have bit.ly resolve to something like http://whatever.com/stuff?magic-keywords=fuzzy-wuzzy-chickens-multiplied-by-gargantuan-apple-google-flash-ipad-porn-naked

Syndicated 2010-05-19 16:33:18 from apenwarr - Business is Programming

18 May 2010 (updated 18 May 2010 at 23:10 UTC) »

Tell me what surprised you: iPad Edition

If someone is about to tell you a long story about a trip they were on, you should make just one request: "Tell me what surprised you." That simple query changes the whole nature of the conversation.

For example, we all know the basic stuff about Paris. It has French people. The food is good. It's pretty. But what surprised you about Paris? Now there's something we can talk about.1

So. Yes, I got an iPad. And I'll do you a favour: I'll tell you my surprise.

What surprised me was iBooks.

No, no, iBooks looks and works exactly like in the pictures and ads. It really is just like that, for better and for worse. That's not the surprise.

The surprise was that it wasn't installed by default.

Think about that. I had to go to the app store, painfully convince it I was a U.S. resident, search for "iBooks" ("books" is definitely not good enough), and download it, all just to get started.

Meanwhile, I downloaded a bunch of other apps. Some of them had ads. Many of those ads were for the Amazon Kindle app, which is also in the app store, and also free, and doesn't require me to be American. And I could click on any of those ads and get straight to app store. Two more taps, and I'm done.

There weren't any ads for the iBooks app. Anywhere. Thus it was harder to find out about iBooks, and as hard or harder to download it, than the Kindle app.

I've been in the computer world for a long time. I've observed Microsoft and how they do things. Heck, I've observed Apple and how they do things. And one thing I've seen for sure: bundling and cross-selling work. If this were Microsoft, they wouldn't have hesitated for a second to give iBooks a boost by including it with the OS.

But Apple deliberately left it out. iBooks has to compete with Kindle in the very same app store, with no free publicity (other than being a "featured" app in some iPad ads and PR).

I can imagine the iBooks team being told that this is it, yes, you can do your bookstore however you want, but we're not going to make it any easier on you. You have to be the best bookstore in the world all by yourself, not just because you tagged along with something that was already great without you.

Now that is surprising.

For the record, iBooks is doing pretty well so far: it absolutely beats the snot out of Kindle for the iPhone/iPad in pretty much every way (except book prices, which are much higher than Amazon's).2

Also interesting to consider is why they allowed this competition with books, but not with music, movies, and phone calls. Have they had a change of heart? A secret contractual obligation? Does Steve Jobs really just not care about books, as he previously claimed?

You might also ask why their Pages, Numbers, and Presentations (or whatever it's called) apps aren't bundled or cross-sold either; anybody making a word processor is on equal footing with Apple's iWork team. And there's no Weather, Stocks, Voice Memos, Clock, or Calculator app included on an iPad either, even though they were all on the iPhone. The iPad has less bundled stuff than ever before - the diametric opposite of what Microsoft has done in any version of Windows, ever.

The rest of the iPad? It's pretty much as expected. I'll spare you.

Footnotes

1 What surprised me about Paris was that, at their fruit stands, every fruit is arranged with absolute care and precision. Compare to a typical grocery store in Canada, where fruit is typically dumped into a bin so you can sort through it yourself. When I think about how much more time it must take to do it the hard way, yes, it surprises me. How can they afford to do that? It's magic. (I also had other related observations at the time.)

2 I won't bother describing the Kindle app's failings in detail. To get you started, I have just two words for you: page numbers. Compare them in Kindle vs. iBooks. Someone at Amazon needs to be shot.

Update 2010/05/18: Hmm, jordanlev wrote to tell me that on his iPad, it popped up a message right away asking whether he wanted to download iBooks. So maybe they're not playing all that nice after all.

Syndicated 2010-05-17 21:05:41 (Updated 2010-05-18 23:10:45) from apenwarr - Business is Programming

Mailing lists are cheap...

...but I still didn't think I'd bother with one for sshuttle, which was just intended to be a weekend toy project. Seems people are actually using it, though, and it's picked up quite a few github followers already. (62 followers isn't that much, but the thing is only 10 days old.)

So okay, here you go: the sshuttle mailing list.

By the way, it seems to not be common knowledge that you can subscribe to googlegroups mailing lists without having a Google account or ever using their web interface. The secret is to send an email to "groupname+subscribe@googlegroups.com", where groupname is the name of the group, in this case, sshuttle. Note that plus sign. It's not a minus sign.

Syndicated 2010-05-11 20:06:03 from apenwarr - Business is Programming

sshuttle 0.30: automatic route and hostname discovery

My fancypants new sshuttle transproxy VPN could already work even if all you had was a ssh session to the other side. And it avoided the TCP-over-TCP trap. And sure, I even made it upload itself to the other end automatically so you wouldn't have to. And it apparently works on MacOS clients now, except Snow Leopard which is not-so-shockingly buggy, and maybe even on FreeBSD. And it manages latency, even under heavy use, so performance doesn't start sucking when you transfer a big file.

So in all those ways, it was already much better than the old Tunnel Vision, which among other things, you had to install by hand on both ends of the connection, and after that the performance was a bit random.

But Tunnel Vision still had a few tricks that sshuttle missed. The first one is automatic route guessing. When TV connected to the other end, the server would tell the client what subnets it was able to reach, and then the client would automatically set up routes for those subnets to go through the tunnel. Neat, right? But with sshuttle, you had to tell the client what to route by hand. No more:

     sshuttle -N -r username@servername

The new -N option enables automatic network determination. You can still add additional subnets (like 0/0 for people who want to route "everything") if you want.

Another fun feature of Tunnel Vision was automatic hostname mapping. You know what sucks about connecting to a remote VPN? You probably don't, so I'll tell you. What sucks is DNS. Your local DNS server doesn't know anything about the hostnames on the other end, and of course they're private so they're not in the public DNS either. So when you try "ssh internalserver", and "internalserver" is some server on the remote internal network, you get an error.

This one is a lot trickier to solve. After all, there's no good way to get a list of hostnames for you to replicate. And once you do, there's also no good way to add them to the local DNS. But does that stop us? Certainly not. It merely confuses us.

     shuttle -H -r username@servername

The new -H option tells the remote sshuttle instance to start prodding around wherever it can (currently, that means at least the local /etc/hosts file, samba nameservers and browse masters, and a bit of DNS) to try to find good hostnames and their matching IP addresses. As it finds them, it beams them back to your client, which adds them temporarily to your local /etc/hosts file. Gross? Oh boy, is it ever! But it works. More or less.

It would be kind of neat to have it get browse lists from things like mdns (aka "zeroconf" aka "bonjour") but I have no idea how to do that.

The old Tunnel Vision sort of had this feature, but it didn't have sshuttle's amazing Name Prodding Technology(tm). You had to configure the names yourself. As it happened, our proprietary Nitix servers had some very scary code to automatically track local hostnames and configure Tunnel Vision appropriately, so the name mapping worked pretty well there. And Nitix servers were usually acting as your DNS, so they could set that up nicely too. Sadly, Nitix's old name prodding is mostly obsolete due to the way modern networks are run (mdns, domain controllers, names-by-dhcp, switched ethernet, and so on). But life marches on. And we all still want the same things.

Anyway, anybody who knows how to get a good list of hostname/ip pairs out of mdns, ideally in a portable fashion, send me an email :) You might also want to look at hostwatch.py and see if you can think of any other interesting sources to scan for names.

Syndicated 2010-05-08 19:56:28 from apenwarr - Business is Programming

Gunless

Movie recommendation. There's not much to say about it that the reviews haven't already. But if you're looking for something with a little more intelligence than a Hollywood Extravaganza, but not so intelligent that it bores me to tears like most independent films, this one works.

    The idea that the Wild West of the United States didn't have any law is completely bogus. There was law. They were settled by laws like we were. And the idea that we had nothing but law and had no weaponry is also ludicrous. Of course we did.

    -- Paul Gross on the Film's Historical Accuracy

So it's also useful if you want to learn totally wrong but funny things about the Canadian "wild west."

Syndicated 2010-05-08 19:25:40 from apenwarr - Business is Programming

5 May 2010 (updated 5 May 2010 at 05:04 UTC) »

Uploading yourself for fun and profit (plus: sshuttle 0.20 "almost" works on MacOS)

After trying out the initial version of sshuttle that I produced this weekend, a few people asked me whether it would be possible to make it work without installing a sshuttle server on the server machine. Can it work with *just* an sshd, they wondered?

Good question.

Some people pointed out ssh's -D option (dynamic port forwarding using a SOCKS proxy). If we just used that (ie. the sshuttle client transproxies stuff into ssh's SOCKS server), then there wouldn't need to be a server side for sshuttle, and that would solve the problem. But sadly, sshd's latency management is pretty thoroughly awful - among other things, it sets its SO_SNDBUF way too high - so if you have a few connections going at once, performance takes a dump. sshuttle has some clever stuff to make sure that doesn't happen even if you've got giant ISOs downloading over your VPN link. I'd like to keep that.

So then I said to myself, hey, self, what if we just uploaded our source code to the remote server and executed it automatically? It works for viruses (technically worms), after all.

You know it's never a good thing when I start talking to myself. And yet the result is surprisingly simple and elegant. Here's a simplified version of what the "stage 1 reassembler" looks like:

   ssh hostname python -c '
   	import sys;
   	exec compile(sys.stdin.read(%d), "assembler.py", "exec")'

Where "%d" is substituted with the length of assembler.py. Assembler.py, by the way, is the "stage 2 reassembler," which looks like this:

    import sys, zlib

z = zlib.decompressobj() mainmod = sys.modules[__name__] while 1: name = sys.stdin.readline().strip() if name: nbytes = int(sys.stdin.readline()) if verbosity >= 2: sys.stderr.write('remote assembling %r (%d bytes)\n' % (name, nbytes)) content = z.decompress(sys.stdin.read(nbytes)) exec compile(content, name, "exec") # FIXME: this crushes everything into a single module namespace, # then makes each of the module names point at this one. Gross. assert(name.endswith('.py')) modname = name[:-3] mainmod.__dict__[modname] = mainmod else: break main()

Yeah, that's right, I gzipped it.

You know the best part? When the server throws an exception, it still gives the right filenames and line numbers in the backtrace, because we assemble each file separately.

If anybody knows the right python incantation to make it import each of the modules as a separate actual module object (rather than just dumping it all into the global namespace, as the comment indicates) please send a patch.

Grab the latest version of sshuttle VPN on GitHub.

"Almost" works on MacOS

Responding to popular request, I thought I would try to get the sshuttle client working on MacOS. (The sshuttle server already works - or at least it should work - on just about any platform with an sshd.)

MacOS, being based on BSD, uses the same ipfw stuff as FreeBSD seems to use. So it "should" be just a matter of having it auto-detect whether the current system uses iptables or ipfw, then run the right commands, right?

Well, almost. I did all that stuff, and I've *almost* got the rules working, but I just can't make it work right. I'm using MacOS X Snow Leopard on my laptop. I checked it in and pushed it anyway in case anybody wants to take a look; the final fix is probably a one liner.

For more information on my conundrum, see my (as yet unanswered) question on ServerFault. If you can contribute an answer, you'll forever be my hero. Even if you don't know anything about ipfw, if you could run through the steps on your version of MacOS or BSD and tell me what happens, it could help narrow things down.

Enjoy!

Syndicated 2010-05-05 04:05:00 (Updated 2010-05-05 05:04:29) from apenwarr - Business is Programming

548 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!