Towards a New Paradigm for C++ Program Development

Posted 11 Oct 2003 at 23:41 UTC by dej Share This

Despite all of the advances that have been made to C++, the language standard still retains a limitation inherent in C's roots: a source code store in the form of interdependent text files. It's high time this state of affairs was improved.

Back in the early 70s when C was first invented, the Unix system was most adept at storing and working with data in text files. A C program was logically divided into "translation units" that were independently compiled and linked to form the final executable. A pre-processor permitted inclusion of code to ensure that all translation units worked from a common set of data definitions. Under this scheme, source files are dependent on the header files, so a program called "make" was created to manage these dependencies. For better or worse, this system is still in use today. It's almost as old as I am.

With today's large C++ programs, the limitations of this system are evident. Basically, the file is too large a unit of granularity for any dependency management system. This problem manifests itself in a number of ways:

  • C++ requires that private implementation details of a class be declared in the same declaration as the public interface details. If a change is made to a private definition, then all source code files that depend on the class definition are recompiled, often unnecessarily. The "pimpl idiom" was created to work around this, but it should not be necessary to alter one's programming style to work around limitations of the language and dependency system.
  • Certain changes to the public interface definitions unnecessarily cause recompilation. For example, adding a public method should not cause dependent objects to be built - the interface has not changed in an incompatible manner. There is a small detail of the layout of functions in the vtable, but an interactive development system should be able to work around this.
  • C++ programs must explicitly include definitions for any classes that are required to compile. In contrast, other languages have facilities that move some of this effort to the compiler.
  • In an effort to reduce unnecessary dependencies, it is desirable that source files include only those definitions that are necessary in order to compile. However, source files are frequently copied as templates for other source files, and without an IDE, the manual effort required to ensure that the include sets are minimal is both considerable and unnecessary.
  • Many C++ systems do not support pre-compiled headers. Such systems spend a lot of time analyzing header definition code. As an example, my own source code is typically no more than 1,000 lines per file. Yet the pre-processed source tops out at over 30,000 lines of code, all of which must be analyzed by GCC. Current systems that do support pre-compiled headers do so with limitations mainly required to preserve the semantics of file inclusion at the preprocesor level.
Other languages are much easier to work with. VHDL and Java both offer the notion of "packages". Once the compiler is instructed to "use" or "import" the package, all definitions in the package become available. The packages are separately compiled; there is no need for the compiler to analyze the package source code. VHDL still suffers from the overly large granularity of "make", however. Model Technology's "vmake" command is almost useless since the resulting Makefiles border on the unmaintainable.

The Eclipse IDE appears to solve this problem for Java. It is capable of determining inter-class dependencies at the data object or method level, and it performs the bare minimum work required to bring a project up to date. This is particularly easy to do for Java for the following reasons:

  • The Java class file format is standard and well-documented; it is easy to determine what methods and instance variables are defined by a class.
  • Java uses late-binding for almost everything. A constant or field reference is implemented as a reference to an object in the constant pool, where the reference may be easily identified as a dependency. In contrast, constant and field references in C++ are typically "inlined" into individual machine instructions, so a dependency on a class is not easy to detect.
Can we do better? Can we come up with a C++ development environment that:
  • Lets one add private methods without recompiling anything that depends only on the public interface.
  • Lets one add a public method without recompiling the world.
  • Automatically determines what classes a given definition (method or type) depends on, and imports them into scope, except in those situations where ambiguity may result.
  • Interoperates with conventional C++ environments through import and export.
I envision a system built up around a bytecode interpreter and database, where each C++ definition, be it a typedef, method definition or template definition, has its own entry in the database. Each entry contains the source code for the definition, as well as the compiled bytecode representation and symbol table. Since we're dealing with bytecode, dependency information is easily available, as it is for Java.

Editing in this environment is not done on files. Rather, the class browser is an integral part of the system. You select a class from the browser, and then select definitions from the class, which brings the source code up in a window. When you're done editing, the system will compile what you changed, and if that compile goes cleanly, it will compile the minimal set of dependencies. If you've ever used HP Basic on a 286 c. 1988 (remember those days? Those machines were FAST!) you'll recognize this paradigm somewhat.

Eventually one will want to export the code so that it can be built using a conventional toolchain. This is easily done; all the system does is export the database as a collection of flat files (ahem, translation units), each with a comment header, includes for the minimal set of dependencies, and then the definitions. Each class gets a .cpp and a .h file.

Import is similar - you read in the translation units and each definition gets its own entry in the database, as well as the source that gave rise to it. Comments at the top of files, and comments between definitions are likely lost in the translation.

As far as I can tell, this system doesn't exist yet. It sounds like just the ticket for MS Visual C++ .NET to implement, but if their latest IDE works like this, Microsoft's web pages certainly don't mention it.

Can this system be developed as open-source? I write language translators for fun and profit, but C++ is still far beyond my abilities. In any event, I certainly don't have time to develop this thing. It is as least as complex as GNU CC, GDB and Kaffe OpenVM put together, and that's an awful lot of complex code.

Does this thing exist?


Yet another language, posted 12 Oct 2003 at 05:29 UTC by braden » (Journeyer)

So what you really want is a new language that gets translated into C++. There's nothing inherently wrong with that; but it will present some problems.

  • What you want from class access modifiers simply can't be provided by C++'s class access modifiers. So you could just make all data public on the C++ side and rely on your new language to enforce access semantics. Of course, someone using your code from C++ wouldn't be subject to the restrictions imposed by your language.
  • The “import” notion would inevitably, I think, require that libraries and headers conform to some naming convention so that they could be properly matched and appropriately coupled by the processor. Your new language can seamlessly support this for the C++ libraries that it creates; however, importing arbitrary C++ libraries may be intractable.
  • Your new language will almost inevitably be more restrictive than C++—at least until it has been under development for some time; and probably even then. That may be a Good Thing for the application domain you have in mind; but it will limit the language's appeal relative to C++.

The upshot of this is that your new language isn't very attractive to a C++ developer unless a fair number of other developers are already using it to create libraries. That's not a novel problem for a nascent language to be faced with—and obviously some overcome it.

But in this case, I'm skeptical that enough value could be added. It sounds to me like a good chunk of what you want could be accomplished with Java by compiling it to machine code and aggressively optimizing away runtime casts where it's safe to do so.

Inheritance By Prototype, posted 12 Oct 2003 at 06:43 UTC by nymia » (Master)

C++ mainly provides class inheritance which is basically written in stone from the beginning. As such, things like the ones mentioned above are already way beyond what C++ was designed for. Though there may be exceptions where runtime types can be reallocated along with the mated symbols, the general C++ population is mainly tuned to the class based approach. I doubt the possibility of it being implemented in C++. Although the article didn't mention specific design-time or run-time behavior, my guess would be that they are design-time based which definitely make sense.

What really the author is after is something close to a language that provides inheritance by prototype, coupled with a database backend for storage definitions and constants.

Definitely a very good idea. Thanks for writing the article.

This is an interesting idea - but don't forget some speedups are possible now., posted 12 Oct 2003 at 15:56 UTC by Stevey » (Master)

 GCC appears to support precompiled headers, which is something that I didn't realise until I went searching for it.

 One of the biggest gains in speed I've seen when I've been rebuilding Linux kernels and large applications like Mozilla comes from caching compiler output.

 There are several projects which do this, ccache, and compiler cache. (There's also the networked builds you can setup with distcc which rock if you have the resources for it).

 But to address the meat of your post there are some techniques which can be used now in C++ for speeding things up, and hiding changes from classes.

 Late binding is a perfect example of this, as you mention for Java code, there's nothing stopping you from writing C++ which uses introspection and dynamic loading of loosely coupled objects - as in COM if I dare mention it.

 I've gone over a lot of large projects and analysed build dependencies, reordering them and changing inheritance hierarchies to minimize coupling and the attendent recompiles. It's not the sexiest work, and it would be nice if tools could be written with Source navigator or similar to do the job more automatically.

 A refactoring tool for C++ which could reorder code to minimize recompiles should be possible ..

precompiled header support not yet released, posted 13 Oct 2003 at 18:01 UTC by jbuck » (Master)

Precompiled header support in GCC is planned for the 3.4 release, which probably won't be out until the end of the year. However, there is working code today in CVS. If you're interested, you can try playing with snapshots or CVS versions, but it's not production-level just yet.

Not a new language, posted 13 Oct 2003 at 19:16 UTC by dej » (Journeyer)

braden, I am not describing a new language. This is good old C++, albeit in a database.

Exporting from the database to a collection of translation units acceptable to a standard-conforming compiler can be achieved without loss of information.

Importing translation units into the database is not necessarily possible. If two translation units attempt to define a name in the same (possibly unnamed or macro) namespace, then you will have a collision. This is possible in translation-unit code, where one may include one header or another, but not both, but does not work well in the new environment.

I am not too familiar with precompiled headers, but I understand that some implementations have restrictions on their use which diminish their usefulness. In particular, on some systems, precompiled headers replace a common prefix of a set of necessary includes. This helps, but tends to result in wasteful inclusion when the code base is ported to a compiler that does not support precompiled headers.

One additional C++ feature would help, posted 14 Oct 2003 at 22:29 UTC by jbuck » (Master)

It would be easier to use C++ as you describe if there were some facility beyone "#include", specifically, a module import. The concept would be that #import foo.h would add all definitions from foo.h to the compiler symbol table, but foo.h itself would be parsed as if it were the very first code encountered. Every other aspect of the language could remain the same, and if careful style rules are followed, code developed under such a scheme could be portable C++ (make sure all headers have include guards and are "self contained"; severely restrict preprocessor tricks, then just replace #import with #include). Since we lack such a rule, most implementations of precompiled headers for C++ require that the precompiled header come first (and typically this is a header that defines everything, which sucks if the same code is used by a more traditional compiler).

Apparently the language standard is already designed in such a way that an implementer could use a module-import approach to implement the standard library (e.g. just directly add certain symbols to the symbol table when the user writes #include <vector>, instead of parsing code).

If not a new language, then you're stuck with C++'s object model, posted 15 Oct 2003 at 04:15 UTC by braden » (Journeyer)

If you aren't thinking in terms of a new language that would be compiled/transformed into C++, then I think you cannot achieve what you want from class access modifiers. C++'s class access modifiers just won't support what you describe.

However, the dependency tracking you want should be doable with a database (or similar collection of metadata). The development environment would need a full-up C++ parser, for starters. The possibilities for assisting with refactoring are definitely interesting.

Your ideas would violate many C++ benefits, posted 15 Oct 2003 at 04:21 UTC by johnnyb » (Journeyer)

I don't like C++. However, I can see that the ideas you are wanting to put into C++ would remove the remaining benefits it _does_ have and you'd simply wind up with something that wasn't any more capable than Java, but had all of the nastiness that comes with C++.

You see, one of the reasons for this interconnected, tangled mess is that with C++, the compiler can perform MANY optimizations that otherwise just wouldn't be possible. Things like inlined functions which are specially compiled when possible. For example, if I had this (forgive my syntax):

/* in the header file */ class foo {

public inline dosomething(int i) { for(b = 1; b < i; b++) { /* do stuff here */ } }

};

/* in another file */ foo a; a.dosomething(3);

With this, the compiler can generate a special, ultrafast version of dosomething(i) which is optimized for the constant 3. In fact, you wouldn't even need the variable, you could simply unroll the loop, remove the function call, and stick it in inline. This is simply not possible with Java and other languages which "import-based" definitions. There are many other implications, but they all center around this same concept - C++ allows the compiler to do optimizations which need as much code available to it as possible. If you didn't want the speed anyway, you probably should've been using a different language all along.

Inlining/Java, posted 15 Oct 2003 at 14:12 UTC by elanthis » (Journeyer)

First, C++ has many advantages over Java besides speed. For one, meta-programming. For two, it's capable of doing more than just OO programming, which is good, because many projects don't fit an OO model at all.

Back to speed, tho, inling is just as possible with an "import" system as otherwise. The trick is, the compiled object files would store both machine-compiled code, and extra "ready to inline" metadata; i.e., parse trees with some source-level optimization done. When you import a module, and use a method/function, the compiler can figure out if the inlined version is usable, and if so, combine it.

The difference is, unlike Java, this will still require object files to be linked into a real final executable. Java compiles a module once, and that output is exactly what is used during runtime - there is no intermediary.

Still doesn't do it, posted 15 Oct 2003 at 14:34 UTC by johnnyb » (Journeyer)

"The trick is, the compiled object files would store both machine-compiled code, and extra "ready to inline" metadata;"

This is just precompiled headers, then. You don't get the advantages of a Java-style module system, because if the implementation changes from release 0.1 to 0.2, you have to recompile against the new version.

Can work, posted 15 Oct 2003 at 14:45 UTC by elanthis » (Journeyer)

You'd only need to recompile if the inlined functions change. Which is just a fact of life - if you don't want to need recompiles, don't use inline functions.

I find that most inline functions I write are very small and simple, and don't ever really change. If the build system can detect that the functions/methods haven't changed, it could skip rebuilding dependeny modules.

Which probably brings up the real pain in C/C++ development - most build tools are "dumb," and only work on timestamps versus actual dependencies (ABI).

This can't be done with an include style solution, since the includes would have to be parsed (a good chunk of the work of a recompile) to detect if ABI has changed; using modules, the ABI fingerprints can be reused for each dependent module.

Re: Caching, posted 15 Oct 2003 at 18:07 UTC by Xorian » (Master)

Stevey wrote:

One of the biggest gains in speed I've seen when I've been rebuilding Linux kernels and large applications like Mozilla comes from caching compiler output.

Caching of build results is one of the features of Vesta.

IBM Visual Age, posted 15 Oct 2003 at 22:49 UTC by pphaneuf » (Journeyer)

Didn't Visual Age for C++ work exactly like what you describe?

IBM Visual Age, posted 16 Oct 2003 at 16:14 UTC by dej » (Journeyer)

Well, if it does I can't find any mention of these features on IBM's web site. They describe the product's standards conformance, but do not describe the IDE at all.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page