Questions of language popularity are always topical, and our own davidw has written an interesting, if frustratingly incomplete article on the topic. So...
Questions of language popularity are always topical, and our own davidw has written an interesting, if frustratingly incomplete article on the topic. So...
Some other resources and discussions to bear in mind:
With respect to free software, C vs C++ is the most relevant language rivalry: the most frustrating hole in David's survey: C++ is not a separate language entry; the second is that David is concerned primarily with overall popularity, and does not try to isolate popularity within the free software community. So the first question is: is there a convincing demonstration anywhere that one or other language is more popular?
The second question is a more personal matter of curiosity: I do not understand how, outside of a few niches such as OS programming where there are long established C-centric traditions, that anyone programming a large project would prefer C++, with its rich set of high-level program structuring facilities over C, with its flatly inadequate support. Why does C seem to do so well?
C as opposed to C++ is a simple language and is easy to understand and debug. C++ frameworks tend to have steep learning curves that require considerable investment in time before one can do useful work with them. C++ code also tends to somewhat more difficult to debug (my opinion). I would assume that free software authors want to maximize the amount of possible contributors that might be able to supply patches. C++ is also notoriously difficult to keep portable due to compiler differences.
Not that there are no good C++ projects out there, KDE comes to mind quickly. But this also has the size of a 500 pound gorilla and the framework is not easy to learn either.
I had thought of this, but there is no obligation with C++ to use the whole of the language. Just to take a simple example, C has completely broken string handling and awkward facilities for IO, problems that are more or less completely solved in C++ in a very accessible manner. To truly master C++ is a considerable achievement, but one can choose a fairly small subset of the C++ facilities and reap tremendous rewards in terms of language expressiveness and code readability.
Of course another aspect of KISS might apply: code maintainers don't want the hassle of policing that only the subset of C++ they choose to use is in fact followed.
Lastly, one of the points, if it was not clear, of the article was to open up the question of which is the more widely used language in free software, C or C++. It doesn't seem to be easy to see which is the more used.
"I had thought of this, but there is no obligation with C++ to use the whole of the language."
This is very close to completely false. C++ does so much behind-the-scenes work in what is not always the most intuitive way, that if you don't know the whole of the language you are likely to get bitten in a number of ways. Casting is the worst on this, as, while in C, casting is not likely to cause many problems, casting in C++ can create whole new objects and cause all sorts of memory management problems. Assignment doesn't always do what you think it will do to copy constructors. If you do not understand the whole of the language, then pretty much any action that you do will be greatly under-informed.
Exceptions can really kill most memory management techniques.
Overloading functions is nasty, especially when you have two functions with similar signatures. For example, if I have a function that can either take an int or a double, but for different reasons, I might get very wrong results if I accidentally pass the result of a computation that I forgot to cast back into an int. Optional arguments and named arguments are good, but overloaded function arguments are bad.
In addition, C++ is tough to link to. Even with the ABI standard, who all is following the standard? It's a pretty complicated standard, so it's only really useful within C++. On top of that, it is unclear what the semantics of libraries opened w/ dlopen or similar is -- like what happens to type information? Anything you dlopen pretty much has to be written in C or something with a reliable C ABI interface, in which case you have to dumb down you coding style to be roughly C-compatible anyway. This means you get all of the headache and none of the advantages of C++.
Personally, I think C++ is pretty useless without a garbage collector attached. If I want to do something low-level and predictable, I'd use C. If I want to do something high-level, I'd use Scheme. If I need to do something mid-level, I'd use Perl. I don't see any place where C++ shines ahead of those in the areas they are used. I really haven't found a class of problems that really shout "C++" to me. It may be better than other tools, but usually when you are using C++ on a project it just means you would have been better off using something else.
Yes, that's exactly what I did in my latest nut project, which does lots of string processing using the C++ STL. In fact, the manually-written C++ code is right at the top level, while the low-level logic is machine-generated.
I'm willing to believe that C++ isn't as popular as C, mainly because of these reasons:
Casting is always a failure point, that's why C++ introduced the more explicit static_cast<...>() etc. so it's more obvious you're doing something questionable. If you ignore the hint and just use C-style casts (or sledgehammer casts, as I call them) that's not the language's fault.
You don't need to know about templates to make your program much simpler and safer by using the STL. IIRC Francis Glassborrow's book You Can Do It makes extensive use of STL template classes and functions without once mentioning how to write your own templates or even how to declare them - he just says something like "vector is an incomplete type, you need to say vector<int> to use it" and gets on with it.
I'd agree you need to know more than just C to use C++, if you write C in C++ you'll get bitten, but you don't need to know the entire C++ language. If you only know C then read a recent introduction to C++ such as Accelerated C++ to learn how to use C++ safely and simply, in a considerably different style to programming in C.
Minor rant about writing C in C++ over, I'd like to mention something more directly relevant to the article.
Eric Raymond, in The Art of UNIX Programming compares the popularity of various languages in Open Source programming, and separates C and C++. I don't know how accurate his figures are, as the book was in development for 5 years, and it shows in palces (some info's a bit out of date), but you can see it online.
C's popularity is understandable, purely based on the fact that popular systems are C based and the default interface is C. These systems naturally give preference to C, both socially and technically. If C is not suitable for general purpose programming, then why build a whole system around the language? The designers of Unix and other systems obviously thought C was a suitable language. In a fairly recent interview, Kernighan states C is still his favorite language, and he also admits "one of the reasons that C++ succeeded was precisely that it was compatible with C".
Many languages do not have the flaws of C, why then in particular is the popularity of C++ important? It should also not be forgotten that popularity is primarily a social phenomenon.
I added C++, Delphi and Fortran, and swapped out ".net" for C#.
Hopefully it's less "frustratingly incomplete" now:-)
I didn't think the article would receive nearly as much attention as it has. It is pretty obvious that a "for fun" article like that has to rely on data sources that have some defects. However, I still stand by it, I think it shows interesting and maybe even useful information.
For Linux, the "default interface" for system calls consists of loading the syscall number and parameters into registers, and then calling int $0x80, which doesn't remotely look like anything like C calling conventions -- in fact, it's more similar to the kind of syscall interface that exists on MS-DOS, where no language is predominant. To get a complete C interface on Linux you need another 1.5Mb of goop (read: glibc).
Also, OS coding and general-purpose coding are very different beasts. For the latter, you are free to throw all the nitty-gritties of memory management, process creation, etc. to a bunch of lower-level libraries. But an OS often runs in a very different environment from a normal hosted program, and the libraries suddenly become quite useless, so you need to do a lot of gruntwork yourself anyway and C++'s extra features probably don't help much.
I know Linux provides a syscall interface, in the case of Linux I don't think it is incorrect to say the default or common interface is C, that this is the encouraged form of interfacing the system and also the windowing system (Xlib). For example, looking up the manual page for socketcall will turn up this text:
socketcall is a common kernel entry point for the socket system calls. call determines which socket function to invoke. args points to a block containing the actual arguments, which are passed through to the appropriate call.
User programs should call the appropriate functions by their usual names. Only standard library implementors and kernel hackers need to know about socketcall.
'int 0x80' doesn't compile on my machine, which uses the PPC instruction set.
Since Linux is supposed to mean multiple architectures, one ought to write in C, which is reasonably portable if the proper care is taken.
It should also not be forgotten that popularity is primarily a social phenomenon.However, you seem to say that as though it's a bad thing. davidw makes it quite clear why he considers popularity worth considering:
If everyone is using a language and contributes a little bit back here and there (libraries, documentation, help on mailing lists), it's certainly more valuable than an equivalent language with none of this participation.The social environment in which a language exists is important for many reasons. Equally applicable is the environment in which an open source projects exists, and historically the environment in which UNIX was invented/nurtured. Why dismiss such social phenomena?
Not really a general popularity measure, but perhaps interesting nonetheless, are the results for the ICFP programming language competition: the list is naturally heavily titled towards functional languages, but C++ was this year's most popular single language , among over 300 submissions; other popular non-functional languages were Java, Python and C; and perl did well in the lightning submissions category.
davidw: Thanks for adding the other languages: they make the document much more valuable. You maybe doing this for fun, and drawing upon imperfect sources, but your treatment oozes professionalism and is a very worthwhile contribution. Two things that would improve the survey would be: analysis of the Sourceforge statistics, and some attempt to look at how language popularity varies by project size.
But it's true that for some reason, C is seen as the über-portable language across the programming world. This is quite strange to me, because I don't recall seeing C's "portability" being heavily touted by its creators. (Contrast this to Java: everywhere you turn, you hear the "Write Once, Run Anywhere" mantra.)
I'm not sure how you derive that interpretation, at least not in the context of the manual page. The manual page for socketcall documents a socketcall as a C function:
The manual page advises not to use this function for user programs, instead "User programs should call the appropriate functions by their usual names." i.e.SYNOPSIS int socketcall(int call, unsigned long *args);
Most of which provide a simple C wrapper around socketcall.SEE ALSO accept(2), bind(2), connect(2), getpeername(2), getsockname(2), get- sockopt(2), listen(2), recv(2), recvfrom(2), send(2), sendto(2), set- sockopt(2), shutdown(2), socket(2), socketpair(2)
Well mslicker, socketcall() isn't available as a C function from any of the standard header files. The closest is sys_socketcall() in and that's only usable from inside the kernel.
And, by adding a few choice words like "i.e.", you can probably `prove' anything you want about Linux, including that Linux is x86-biased, US-biased, rich-biased, etc.... or even that Linux is anti-Semitic. Goes to show how much value I attach to such a method of `interpretation'.
Finally, redi says it well.
You can't disprove the fact that "the manual page for socketcall documents socketcall as a C function" by saying "socketcall() isn't available as a C function from any of the standard header files". Nor does your statement disprove that C is the encouraged method of interfacing the system as a whole.
If you think my use of "i.e" is a non-sequitur, what does "appropriate functions by their usual names" refer to? If the "appropriate functions by their usual names" are not the C functions provided by the C runtime listed in the "SEE ALSO" section of the socketcall manual page, what are they? Where are the writers of the "Linux Programmer's Manual" pointing us to?
"johnnyb, how does overloading affect you if you choose not to overload functions?"
It doesn't, unless you are using a library of functions that uses it. Of course, even if you know how to use it, my point was that it is way too easy to get bitten and have know idea why when the compiler chooses a different way to cast your parameters than you think it will, and just because of casts runs an entirely different set of code.
The C++ language is a mess, and if one were to do high-level programming, I have no idea why one would choose C++. The only programming paradigm that C++ supports better than other languages is Alexandrescu's Policy Classes -- even then you might be better off with Scheme, I just haven't figured out how to make macros do policy classes. What's really amusing is that C++ forces you to write recursive macro programs, while Scheme's macro facility does not. Again see Alexandrescu for more details.
In fact, I had wholly given up on C++ until I read Modern C++ Design I had entirely given up on the language as being both absurd and useless. Now I see it as having a little theoretical value while we search to find better ways of doing what Alexandrescu describes, because C++ certainly isn't worth it.
It also amazes me that garbage collection is not a default part of more C++ compilers. I don't see, given how much work C++ does to try to make memory leaks and other allocation problems, why it is not mandatory that C++ compilers at least ship with an optional garbage collector. It's not like Boehm is that hard to link with. I'd even go so far as to say maybe C++ needs to have garbage collection be the default.
Anyway, Scheme is awesome. The deeper I get into it the deeper I want to go. In how many other languages can you do ambiguous assignments, and decide _later_ what you wanted that previous assignment to be? Ahhh, the joy of continuations. For example, with a small library, I can do the following:
(let ( (a (amb 1 2 3 4 5)) ;Assign a an ambiguous value between 1 and 5 (b (amb 4 5 6 7 8)) ;Assign b an ambiguous value between 4 and 8 (c #f)) ;c starts uninitialized (set! c (+ a b)) ;c is set to a + b (amb-assert (> a b)) ;Make an assertion about a and b c ;return c )
This snippet of code will retroactively assign a the value of 5 and b the value of 4 based on the assertion given. Since a=5,b=4 is the only value that satisfies the assertion, the return value of the above construction will be 9. This is a really nice feature for logic programming. The code for the "amb" functions is only about a page long.
Because it compiles to fast code! Remember that your program doesn't run in an alternate universe where time and space doesn't matter (read: the Boss Zone), it runs on a machine which exists here and now, and which does have time and space limitations.
Theoretically, you can apply powerful optimization techniques any language -- even Prolog -- to make them competitive with C++, but let's face it, implementations of such techniques don't exist yet. (Last I heard, the people behind the Self language were able to get Self running at half the speed of C++. Yes, only half.) And even if they do exist, they'll likely take up more memory at run time than an equivalent C++ program.
Besides being fast, C++ also provides a palette of rather high-level facilities. They may not be the most elegant in the world, but they're there, and -- most importantly -- they're efficient.
By the way, researchers who code computationally-intensive algorithms in Java deserve to be rounded up, whacked in the head, and shot.
Let's not get into yet another argument about the best language, and consider the question of why C and C++ compete in the same space, rather than why C++-doesn't-deserve-to-be-where-it-is-because-$language-beats-it-hands-down. As measured by davidw and ESR, C++ is a lot more popular than many higher-level, "better" languages. The issue is not why is that situation ludicrous, but why is that situation the one we're in.
Maybe a problem with much C++ code is a consequence of the language's popularity - the more popular the language, the more likely it is for non-experts to bang out shaky designs full of "clever" features. As ncm has observed, [Inlines] are the third most-misused C++ feature (after inheritance and overloading). This is maybe a fault of the language's popularity, not of the language itself. If another language were more popular we might complain about all the badly-written $language code that just doesn't get it right. Although I strongly agree that in C++ "there is no obligation with C++ to use the whole of the language" noone seems to teach that, so the inexperienced rush in and try to use every feature available. chalst has a good point that C's popularity might be a precaution against misuse of C++'s higher-level features.
"C++ doesn't kill projects, bad programmers do" may well be true, but many countries still choose to ban guns. I wonder what ESR would say to that ...
Efficient in what? making errors?Besides being fast, C++ also provides a palette of rather high-level facilities. They may not be the most elegant in the world, but they're there, and -- most importantly -- they're efficient.
I disagree with that statement, popularity is very close to community as they are derived from a single source: populi. To me, if a community is behind a language, then by itself it is considered popular.Popularity is not the same thing as community. You don't need to beat C in a popularity contest to have a sustainable community.
I can't remember who first told me this, but it's totally true. "There are two types of languages. There's the kind people complain about, and the kind they don't use."
People complain about C++. Like crazy. But it gets the job done, no matter how painful it was. For almost any other language, not every job is even possible, let alone painless. For example, try to write a fast program in Java or python.
For those of you who think it's impossible to have both a good programming environment and speed, I present:
It's not impossible to have a better language than C++ and still have good or even better performance. All of these languages (except maybe Sisal, which I don't know much about) have better, higher-level facilities than C++. And, as you can see here, they can be competitive or even faster. And there is still quite a bit of room left for optimization.
The problem is that the companies who develop and promote tools really do a piss-poor job. They give us languages that are 20 years outdated when they arrive, and then load us up with tools and documentation to try to make up for their deficiencies, and we thank them for it. What we really need to do is learn these kinds of languages, and demand that our tool vendors ship these languages or languages like them.
Another data point on Scheme-vs-C++ is that Scheme allows quite a bit of compile-time programming -- allowing you to do a lot of computation that would otherwise have had to be done at program startup in the C++ program.
Link should be http://www.bagley.org/~doug/shootout/craps.shtml.
...but in order to get the competitive performance, you pretty much have to forgo the high-level features which make it attractive in the first place. For example, the "word frequency" program in C++ is ~4 screenfuls, while the corresponding MLton program is ~16 screenfuls. Where's the "high-level" advantage again?
In addition, doing complex precomputations at compile time isn't an alien concept in C/C++ circles. The lex and yacc programs are prime examples of this: they precompute state machine tables instead of forcing your application to compute them at run time. The precomputation doesn't even need to be done in C or C++: for my nut project, I used a 3rd-party Prolog program to generate C++ tables. (The only "advantage" of Scheme is that it does these precomputations in the framework of the language itself, while for C or C++ you need a separate mechanism.)
In the "word frequency" program, the SML program must implement a Hash table, quicksort, and for some reason insertion sort. The OCaml (in the same family as SML) implementation comes right behind C++ in that benchmark, and is only 29 lines in comparison with 79 lines for C++
isn't this entire thread what usenet is for?
Trying to bring this discussion back around to language popularity...
tk: I give MLton credit for letting you write efficient code, and letting you have high-level language facilities. If it takes 4x as many lines of code to do it efficiently in MLton as in C++ (I haven't confirmed this; I'll just believe you) then that's pretty sucky, because C++ certainly isn't awe-inspiring in its brevity. But at least you can.
This is also why people use C++. Most people hate C++; most people (although the sets don't overlap completely) also use C++. This is because C++ can do whatever you want. It can be fast (as fast as C; I don't know why the shootout gives it a lower score), reliable, or reasonably expressive. You can't really have them all at once in C++, but no other language gives you that either.
People keep coming back to functional languages like scheme in arguments like this. I really like the looks of scheme; you could say it already has all the features of all the other languages - other than syntax - and it (and lisp) had them before everyone else. But I tried all the scheme interpreters and compilers in Debian, and they all used (much) more RAM and ran more slowly than the equivalent C++ program, and most of the interpreters were crash-prone. (A crash-prone interpreter? Give me a break!) The existence of bad interpreters is probably not the fault of scheme, but it is probably one reason (along with weird/missing syntax) that scheme isn't very popular.
Because it so clearly is, perhaps. The Linux Standard Base is defined in terms of shared libraries and the ABIs they support; the only machine-readable form of most of the corresponding APIs are C headers, so you at least need enough of a C-compatible language to parse function prototypes, structs, unions, #defines and inline functions.
How many non-C languages are available on Linux that talk to the kernel directly instead of calling functions in libc and similar libraries? I don't know of many.
(Whether, in practice, the libc interfaces are more or less stable than calling the kernel directly would be is a contentious issue. I've certainly been burnt by minor (even patchlevel, once) changes in libc breaking my apps which called not-quite-supported stuff)
I don't suppose that anyone would argue that if one were to design a high-level language with a completely free hand, one would think one had done a good job by coming up with C++. However C++ does represent a certain kind of language design achievement, once one bears a few factors in mind:
There was a nice message, Re: Compatibility questions, posted to the comp.lang.scheme list last year about why C++ lacks GC.
The typical C environment is not runtimeless at all; it's just that the runtime environment is (unsurprisingly) a close match for the unix runtime environment, so you tend not to notice it's there. What happens when you only have 64Mb of free RAM, and you write to an array 96Mb in size? The runtime transparently shuffles some pages to disk to free up others so that your program can continue. It happens that this runtime support, on a mainstream unix box, at least, is provided once for all processes in the kernel, instead of per-process, but it's still there and it's still throwing occasional "that operation took much longer than it ought to have done" spanners in your execution model.
If you'd said that "a good fit for some popular runtime environment" is a major factor in C/C++'s success, I'd have agreed - though as C and unix have grown up together it's not too surprising that they're well-suited for each other - but to claim it has no special runtime requirements is like a human claiming it has no special respiratory requirements.
dan: The typical C environment is not runtimeless at all; it's just that the runtime environment is (unsurprisingly) a close match for the unix runtime environment.
This is a good point, especially the bit about "that operation took much longer than it ought to have done" spanners; note I hedged by saying of C++ it "as far as possible" respects this principle. I think, though, that the UNIX model can be seen as clinging very close to the contours of the underlying mini/PC architecture, much closer than one might a priori imagine is consistent with the tolerable usability it achieves. The UNIX runtime environment is rather more fundamental than some popular runtime environment might suggest.
None of this is to say that there might not be better or more fundamental models than the C/C++/UNIX model; rather, the point of my post was to point out that the current trends in programming languages research don't seem to be particularly helpful at revealing them, and presuming GC is particularly vision impairing in this respect.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!