Advogato: Towards a New Paradigm for C++ Program Development

Back in the early 70s when C was first invented, the Unix system was most adept at storing and working with data in text files. A C program was logically divided into "translation units" that were independently compiled and linked to form the final executable. A pre-processor permitted inclusion of code to ensure that all translation units worked from a common set of data definitions. Under this scheme, source files are dependent on the header files, so a program called "make" was created to manage these dependencies. For better or worse, this system is still in use today. It's almost as old as I am.

With today's large C++ programs, the limitations of this system are evident. Basically, the file is too large a unit of granularity for any dependency management system. This problem manifests itself in a number of ways:

C++ requires that private implementation details of a class be declared in the same declaration as the public interface details. If a change is made to a private definition, then all source code files that depend on the class definition are recompiled, often unnecessarily. The "pimpl idiom" was created to work around this, but it should not be necessary to alter one's programming style to work around limitations of the language and dependency system.
Certain changes to the public interface definitions unnecessarily cause recompilation. For example, adding a public method should not cause dependent objects to be built - the interface has not changed in an incompatible manner. There is a small detail of the layout of functions in the vtable, but an interactive development system should be able to work around this.
C++ programs must explicitly include definitions for any classes that are required to compile. In contrast, other languages have facilities that move some of this effort to the compiler.
In an effort to reduce unnecessary dependencies, it is desirable that source files include only those definitions that are necessary in order to compile. However, source files are frequently copied as templates for other source files, and without an IDE, the manual effort required to ensure that the include sets are minimal is both considerable and unnecessary.
Many C++ systems do not support pre-compiled headers. Such systems spend a lot of time analyzing header definition code. As an example, my own source code is typically no more than 1,000 lines per file. Yet the pre-processed source tops out at over 30,000 lines of code, all of which must be analyzed by GCC. Current systems that do support pre-compiled headers do so with limitations mainly required to preserve the semantics of file inclusion at the preprocesor level.

Other languages are much easier to work with. VHDL and Java both offer the notion of "packages". Once the compiler is instructed to "use" or "import" the package, all definitions in the package become available. The packages are separately compiled; there is no need for the compiler to analyze the package source code. VHDL still suffers from the overly large granularity of "make", however. Model Technology's "vmake" command is almost useless since the resulting Makefiles border on the unmaintainable.

The Eclipse IDE appears to solve this problem for Java. It is capable of determining inter-class dependencies at the data object or method level, and it performs the bare minimum work required to bring a project up to date. This is particularly easy to do for Java for the following reasons:

The Java class file format is standard and well-documented; it is easy to determine what methods and instance variables are defined by a class.
Java uses late-binding for almost everything. A constant or field reference is implemented as a reference to an object in the constant pool, where the reference may be easily identified as a dependency. In contrast, constant and field references in C++ are typically "inlined" into individual machine instructions, so a dependency on a class is not easy to detect.

Can we do better? Can we come up with a C++ development environment that:

Lets one add private methods without recompiling anything that depends only on the public interface.
Lets one add a public method without recompiling the world.
Automatically determines what classes a given definition (method or type) depends on, and imports them into scope, except in those situations where ambiguity may result.
Interoperates with conventional C++ environments through import and export.

I envision a system built up around a bytecode interpreter and database, where each C++ definition, be it a typedef, method definition or template definition, has its own entry in the database. Each entry contains the source code for the definition, as well as the compiled bytecode representation and symbol table. Since we're dealing with bytecode, dependency information is easily available, as it is for Java.

Editing in this environment is not done on files. Rather, the class browser is an integral part of the system. You select a class from the browser, and then select definitions from the class, which brings the source code up in a window. When you're done editing, the system will compile what you changed, and if that compile goes cleanly, it will compile the minimal set of dependencies. If you've ever used HP Basic on a 286 c. 1988 (remember those days? Those machines were FAST!) you'll recognize this paradigm somewhat.

Eventually one will want to export the code so that it can be built using a conventional toolchain. This is easily done; all the system does is export the database as a collection of flat files (ahem, translation units), each with a comment header, includes for the minimal set of dependencies, and then the definitions. Each class gets a .cpp and a .h file.

Import is similar - you read in the translation units and each definition gets its own entry in the database, as well as the source that gave rise to it. Comments at the top of files, and comments between definitions are likely lost in the translation.

As far as I can tell, this system doesn't exist yet. It sounds like just the ticket for MS Visual C++ .NET to implement, but if their latest IDE works like this, Microsoft's web pages certainly don't mention it.

Can this system be developed as open-source? I write language translators for fun and profit, but C++ is still far beyond my abilities. In any event, I certainly don't have time to develop this thing. It is as least as complex as GNU CC, GDB and Kaffe OpenVM put together, and that's an awful lot of complex code.

Does this thing exist?

Towards a New Paradigm for C++ Program Development

Posted 11 Oct 2003 at 23:41 UTC by dej

Yet another language, posted 12 Oct 2003 at 05:29 UTC by braden » (Journeyer)

Inheritance By Prototype, posted 12 Oct 2003 at 06:43 UTC by nymia » (Master)

This is an interesting idea - but don't forget some speedups are possible now., posted 12 Oct 2003 at 15:56 UTC by Stevey » (Master)

precompiled header support not yet released, posted 13 Oct 2003 at 18:01 UTC by jbuck » (Master)

Not a new language, posted 13 Oct 2003 at 19:16 UTC by dej » (Journeyer)

One additional C++ feature would help, posted 14 Oct 2003 at 22:29 UTC by jbuck » (Master)

If not a new language, then you're stuck with C++'s object model, posted 15 Oct 2003 at 04:15 UTC by braden » (Journeyer)

Your ideas would violate many C++ benefits, posted 15 Oct 2003 at 04:21 UTC by johnnyb » (Journeyer)

Inlining/Java, posted 15 Oct 2003 at 14:12 UTC by elanthis » (Journeyer)

Still doesn't do it, posted 15 Oct 2003 at 14:34 UTC by johnnyb » (Journeyer)

Can work, posted 15 Oct 2003 at 14:45 UTC by elanthis » (Journeyer)

Re: Caching, posted 15 Oct 2003 at 18:07 UTC by Xorian » (Master)

IBM Visual Age, posted 15 Oct 2003 at 22:49 UTC by pphaneuf » (Journeyer)

IBM Visual Age, posted 16 Oct 2003 at 16:14 UTC by dej » (Journeyer)