20 Jul 2010 apenwarr   » (Master)

You can't make C++ not ugly, but you can't not try

...everything that's wrong with C++ comes down to that.

Background: I've been programming in C++ since about 1993; that's 17 years now. As late as 2009, I chose C++ to write our Windows client for EQL Data. If I were to make that decision today, I would still choose C++, because, quite simply, nothing else would work. (Okay, C would work, but it would be at least 5x as much effort. So no thanks. And for making plugins to legacy Windows apps, there's just nothing else out there.)

So, okay, I know a fair bit about C++. I've managed 30-person development teams building huge stuff in C++. Successfully. I have some context here.

And my context is: even if there's nothing better for the job, the truth is that C++ is incredibly ugly and misdesigned. C++ is a trap: they tell you that you can do anything you want in C++. Anything! C++ isn't a language, they say, it's a language construction kit! Build the language of your dreams in C++! And it'll be portable and scalable and fast and standardized!

And this is so close to true that even after using it for 17 years, I still almost believe it. I used to actually believe it. But see, some recent experience with the "amazing innovations" in other programming languages have convinced me otherwise.

First of all, if you haven't done much C++, you need to realize: most of the stuff in there is utter putrid boneheaded crap. This includes the RTTI and exceptions stuff; C++'s versions of those were enough to convince a whole generation of programmers that introspection and exceptions were outright evil and should be avoided. But as it turns out, that's only true in C++.

If you've heard anything about C++, you've probably heard that there's no standard string class and everybody rolls their own. That's not actually one of the bad things, in my opinion. As a person who's done a lot of coding in C++, I've actually come to understand that there really are good reasons to use different string objects at different times. In python, my current language of choice whenever it's appropriate, there's only one string class, and it's mostly okay, but every now and then you really want to just replace one character in the string with one other character (an O(1) operation) but you can't, so you instead construct a new string (an O(n) operation) and your program is vastly slower and less scalable. (Happily, python makes it pretty easy to create string-like objects, particularly using C extensions, so if you really need it, you can make it still go fast. But effectively thats just exactly creating your own string class, like so many people do in C++. See? Not always evil.)

The problem isn't interchangeable string classes. The problem is that the default C++ string class is so utterly godawfully stupid. No garbage collection? Check. No refcounting? Check. Need to allocate/free heap space just to pass a string constant to a function? Check. No support for null strings? Check. Horrendous mess of templates that makes tracing in a debugger utterly painful? Check. Horrendous mess of templates that makes non-ultramodern compilers unable to optimize them so that, for years, your toy homemade string class was 5x faster? Check. Totally unclear what character type it uses (actually you can use whatever you want at different times)? Check. Totally missing a sprintf-like formatter so you have to use something, anything, oh god please save me from iostreams just to produce a dynamic string? Check. Can't append to a string without allocating a whole new one? Check. Using the "+" append operator produces more temporary objects than you can count? Check. Using the "+" operator with two string constants gives a weird compiler error about adding pointers? Check.

In contrast, let's take, say, python's strings: refcounted, passed by reference, nullable, compatible with string constants, no templates, trivially easy debugging, always the same character type (although they changed it in python 3, sigh), include a sprintf-like operator, the + append operator works fine and multiple appended constants can be optimized at compile time (python's interpreter compiles to a metalanguage and can do basic optimizations like this). They even have an optimized non-constant append operator in newer versions of python that's more efficient than making a whole new copy every time.

How many of these string features required us to use an interpreted language? Precisely zero. An imaginary, fictional version of C++ could have had a string class with all these features and been just as fast and efficient. And I bet a lot fewer people would have written their own if that had been the case. There's actually no excuse for the crap that is C++ std::strings; they aren't better. They're just, somehow, the standard.

Another C++ problem that's close to my heart is function pointers. Not even lambdas or anonymous functions - let's not get all fancy, here. Just plain pointers to existing named functions. C++, being a superset of C, has function pointers, of course. And while the syntax for them has always been a little funny, they actually work fine and don't make you want to kill people too often. (Everywhere C function pointers are used they should always have a void *userdata parameter, and when people don't do that (like in qsort()), then you do want to kill things... but that's not C's fault, and sensible programmers can avoid that mistake.) So ok. C++ has function pointers.

But here's the thing: they utterly failed to extend this concept to include pointers to methods of an object.

Okay, that's not really true. In fact, it's a little-known fact that C++ - the language, not the insane libraries or templates - has built-in support for function pointers that call member functions.

The bad news is, this feature is so horrendously ill-conceived that absolutely nobody uses it for anything. Seriously. Nobody. I tried my best. The feature really is actually useless. The article I linked to tried desperately to make them look like maybe they have a purpose, but no. They just don't. (You can see the main problem in the linked article under the section "Member Function Pointers Are Not Just Simple Addresses." You might think, oh, of course not. They're a "this" pointer plus an address, right? Ha ha! Ha ha ha ha!! No they're not! They don't have a this pointer! You still have to provide your own this pointer when you call it! But it does store all kinds of crazy other stuff instead so it can do call-time vtable lookups on multiply-inherited objects! Ha ha!)

Utterly useless. But the bad thing isn't so much that it's useless - although maybe someone should have noticed that and killed the feature before it somehow passed the standards committee. The bad thing is that there is an obvious way to do it that wouldn't have been useless: just make a member function pointer be a struct { obj, funcaddress }. Everybody knows that calling a member function obj.f(x,y,z) in C++ is actually done by calling f(obj,x,y,z). There would be nothing to it. Since you know 'this' at the time you create the function pointer, you can resolve the funcaddress from the name 'f' at that point - the same way you would when making any method call, including vtables, multiple inheritance, and everything - and the code receiving the pointer would always just run it as (*funcaddress)(obj, ...). So easy. Nothing to it. So very much terrible C++ code would never have been written if this feature existed.

But it doesn't. There are alternatives, of course - numerous ones, and all terrible, and all incompatible, because the language designers simply failed utterly to do their job. The boost (now TR1) one has the cutest syntax, but God help you if you make a typo using it, because you'll get pages of template gibberish.

Stop and think about that for a second. Template gibberish. For a simple function pointer! Every language not designed by idiots in the last 20 years, including Turbo Pascal, has some kind of function pointers. ASM has function pointers. C has function pointers. This isn't hard. It has nothing to do with making fancy type-independent efficient data structures, for which templates/generics are actually justified. It has to do with a trivial operation that's a basic part of every compiled language: pushing some parameters on the stack and jumping to an address.

While I'm here, no, strings are not "generic" data structures either. The fact that std::string is a template is also incredibly insulting.

Okay, one more example of C++ terribleness. This one is actually a tricky one, so I can almost forgive the C++ guys for not thinking up the "right" solution. But it came up again for me the other day, so I'll rant about it too: dictionary item assignment.

What happens when you have, say, a std::map of std::string and you do m[5] = "chicken"? Moreover, what happens if there is no m[5] and you do std::string x = m[5]?

Answer: m[5] "autovivifies" a new, empty string and stores it in location 5. Then it returns a reference to that location, which in the first example, you reassign using std::string::operator=. In the second example, the autovivified string is copied to x - and left happily floating around, empty, in m[5].

Ha ha! In what universe are these semantics reasonable? In what rational set of rules does the right-hand-side of an assignment statement get modified by default? Maybe I'm crazy - no, that's not it - but when I write m[5] and there's no m[5], I think there are only two things that are okay to happen. Either m[5] returns NULL (a passive indicator that there is no m[5], like you'd expect from C) or m[5] throws an exception (an aggressive indicator that there is no m[5], like you'd see in python).

Ah, you say. But look! If that happened, then the first statement - the one assigning to m[5] - wouldn't work! It would crash because you end up assigning to NULL!

Yes. Yes it would. In C++ it would, because the people who designed C++ are idiots.

But in python, it works perfectly (even for user-defined types). How? Simple. Python's parser has a little hack in it - which I'm sure must hurt the python people a lot, so much do they hate hacks - that makes m[5]= parse differently than just plain m[5].

The python parser converts o[x]=y directly into o.__setitem__(x,y). Whereas o[x] without a trailing equal sign converts directly into o.__getitem__(x). It's very sad that the parser has to do such utterly different things with two identical-looking uses of the square bracket operator. But the result is you get what you expect: __getitem__ throws an exception if there's no m[5]. __setitem__ doesn't. __setitem__ puts stuff into your object; it doesn't waste time pulling stuff out of your object (unless that's a necessary internal detail for your data structure implementation).

But even that isn't the worst thing. Here's what's worse: C++'s crazy autovivification stuff makes it slower, because you have to construct an object just so you can throw it away and reassign it. Ha ha! The crazy language where supposedly performance is all-important actually assigns to maps slower than python can! All in the name of having language purity, so we don't have to have stupid parser hacks to make [] behave two different ways!

...

"...Well," said the C++ people. "Well. We can't have that."

So here's what they invented. Instead of inventing a sensible new []= operator, they went even more crazy. They redefined things such that, if your optimizer is sufficiently smart, it can make all the extra crap go away.

There's something in C++ called the "return value optimization." Normally, if you do something like "MyObj x = f()", and f returns a MyObj, then what would need to happen is that 'x' gets constructed using the default constructor, then f() constructs a new object and returns it, and then we call x.operator= to copy the object from f()'s return value, then we destroy f()'s return value.

As you might imagine, when implementing the [] setter on a map, this would be kind of inefficient.

But because the C++ people so desperately wanted this sort of thing to be fast, they allowed the compiler to optimize out the creation of x and the copy operation; instead, they just tell f() to construct its return value right into x. If you think about it hard enough, you can see that, assuming the stars all align perfectly, m[5] = "foo" can benefit from this operation. Probably only if m.operator[] is inlined, but of course it is - it's a template! Everything in a template is inlined! Ha ha!

So actually C++ maps are as fast as python maps, assuming your compiler writers are amazingly great, and a) implement the (optional) return-value optimization; b) inline the right stuff; and c) don't screw up their overcomplicated optimizer so that it makes your code randomly not work in other places.

Okay, cool, right? Isn't this a triumph of engineering - an amazingly world class optimizer plus an amazingly supercomplex specification that allows just the right combination of craziness to get what you want?

NO!

No it is not!

It is an absolute failure of engineering! Do you want to know what real engineering is? It's this:

map_set(m, 5, "foo");
char *x = map_get(m, 5);

That plain C code runs exactly as fast as the above hyperoptimized ultracomplex C++. *And* it returns NULL when m[5] doesn't exist, which C++ fails to do.

In the heat of the moment, it's easy to lose sight of just how much of C++ is absolutely senseless wankery.

And this, my friends, is the problem.

As with any bureaucracy, the focus slowly shifts from finding a simple, elegant way to solve your problem to just goddamn winning this one battle with the system so that you can get the bloody thing working at all. It would have been easy, at any time, for the C++ committee to have just added a new operator[]=. It would have been totally backward-compatible: any object without an operator[]= would keep working just like it always has.

But they couldn't do that. Doing that would be admitting defeat.

They could have made up a new syntax for sensible member function pointers, any time they wanted. Again, no concern about backwards compatibility - if you don't use it, it doesn't affect you.

They could have written a sensible string class. In fact, people did. Lots of people! But for some reason, they standardized on the non-sensible one. Now C++ users are forever cursed: either you use std::string, and pay endlessly for its suck, or you use your own string class, and be one of those people who constantly gets criticized for designing their own string class.

It is possible to write C++ that's not crap - in theory. This is because it's possible to write C that's not crap, and C programs will compile as C++. Then, you can add a sprinkle of the non-sucky parts of C++ - deterministic construction/destruction (RAII) is one of them - and you'll have a program that's undoubtedly better, more readable, and easier to debug than it would have been in pure C.

But you can't stop there. You should, but you can't. Nobody can. It would be superhuman. Because you'll see something that should be a little clearer, a little easier. Maybe it's string concatenation, maybe it's member function pointers, maybe it's operator[]. But you'll see it, and you'll start trying to solve it. And 1000 lines of code later, you'll have made your life - and the lives of everyone who has to maintain your programs - much worse.

For me it was function pointers. Over the years in wvstreams, I tried doing them so many different ways - using C-style function pointers with wrapper functions, using inheritance and virtual functions, using the insane C++ member function pointers, using templates and the insane C++ member function pointers. Finally, nowadays, function pointers in WvStreams use boost's new functor stuff, which has been standardized by TR1. And every single time I use one, I have to look up the syntax.

For my own library that I've spent the last 12 years building. I have to look up the syntax to declare a callback.

I should have just stuck with plain C function pointers.

Let this be a warning to you.

Syndicated 2010-07-20 03:56:30 (Updated 2010-09-21 19:06:01) from apenwarr - Business is Programming

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!