Despite all of the advances that have been made to C++, the language standard still retains a limitation inherent in C's roots: a source code store in the form of interdependent text files. It's high time this state of affairs was improved.
Despite all of the advances that have been made to C++, the language standard still retains a limitation inherent in C's roots: a source code store in the form of interdependent text files. It's high time this state of affairs was improved.
Back in the early 70s when C was first invented, the Unix system was most adept at storing and working with data in text files. A C program was logically divided into "translation units" that were independently compiled and linked to form the final executable. A pre-processor permitted inclusion of code to ensure that all translation units worked from a common set of data definitions. Under this scheme, source files are dependent on the header files, so a program called "make" was created to manage these dependencies. For better or worse, this system is still in use today. It's almost as old as I am.
With today's large C++ programs, the limitations of this system are evident. Basically, the file is too large a unit of granularity for any dependency management system. This problem manifests itself in a number of ways:
The Eclipse IDE appears to solve this problem for Java. It is capable of determining inter-class dependencies at the data object or method level, and it performs the bare minimum work required to bring a project up to date. This is particularly easy to do for Java for the following reasons:
Editing in this environment is not done on files. Rather, the class browser is an integral part of the system. You select a class from the browser, and then select definitions from the class, which brings the source code up in a window. When you're done editing, the system will compile what you changed, and if that compile goes cleanly, it will compile the minimal set of dependencies. If you've ever used HP Basic on a 286 c. 1988 (remember those days? Those machines were FAST!) you'll recognize this paradigm somewhat.
Eventually one will want to export the code so that it can be built using a conventional toolchain. This is easily done; all the system does is export the database as a collection of flat files (ahem, translation units), each with a comment header, includes for the minimal set of dependencies, and then the definitions. Each class gets a .cpp and a .h file.
Import is similar - you read in the translation units and each definition gets its own entry in the database, as well as the source that gave rise to it. Comments at the top of files, and comments between definitions are likely lost in the translation.
As far as I can tell, this system doesn't exist yet. It sounds like just the ticket for MS Visual C++ .NET to implement, but if their latest IDE works like this, Microsoft's web pages certainly don't mention it.
Can this system be developed as open-source? I write language translators for fun and profit, but C++ is still far beyond my abilities. In any event, I certainly don't have time to develop this thing. It is as least as complex as GNU CC, GDB and Kaffe OpenVM put together, and that's an awful lot of complex code.
Does this thing exist?
So what you really want is a new language that gets translated into C++. There's nothing inherently wrong with that; but it will present some problems.
The upshot of this is that your new language isn't very attractive to a C++ developer unless a fair number of other developers are already using it to create libraries. That's not a novel problem for a nascent language to be faced with—and obviously some overcome it.
But in this case, I'm skeptical that enough value could be added. It sounds to me like a good chunk of what you want could be accomplished with Java by compiling it to machine code and aggressively optimizing away runtime casts where it's safe to do so.
C++ mainly provides class inheritance which is basically written in stone from the beginning. As such, things like the ones mentioned above are already way beyond what C++ was designed for. Though there may be exceptions where runtime types can be reallocated along with the mated symbols, the general C++ population is mainly tuned to the class based approach. I doubt the possibility of it being implemented in C++. Although the article didn't mention specific design-time or run-time behavior, my guess would be that they are design-time based which definitely make sense.
What really the author is after is something close to a language that provides inheritance by prototype, coupled with a database backend for storage definitions and constants.
Definitely a very good idea. Thanks for writing the article.
GCC appears to support precompiled headers, which is something that I didn't realise until I went searching for it.
One of the biggest gains in speed I've seen when I've been rebuilding Linux kernels and large applications like Mozilla comes from caching compiler output.
There are several projects which do this, ccache, and compiler cache. (There's also the networked builds you can setup with distcc which rock if you have the resources for it).
But to address the meat of your post there are some techniques which can be used now in C++ for speeding things up, and hiding changes from classes.
Late binding is a perfect example of this, as you mention for Java code, there's nothing stopping you from writing C++ which uses introspection and dynamic loading of loosely coupled objects - as in COM if I dare mention it.
I've gone over a lot of large projects and analysed build dependencies, reordering them and changing inheritance hierarchies to minimize coupling and the attendent recompiles. It's not the sexiest work, and it would be nice if tools could be written with Source navigator or similar to do the job more automatically.
A refactoring tool for C++ which could reorder code to minimize recompiles should be possible ..
Precompiled header support in GCC is planned for the 3.4 release, which probably won't be out until the end of the year. However, there is working code today in CVS. If you're interested, you can try playing with snapshots or CVS versions, but it's not production-level just yet.
Exporting from the database to a collection of translation units acceptable to a standard-conforming compiler can be achieved without loss of information.
Importing translation units into the database is not necessarily possible. If two translation units attempt to define a name in the same (possibly unnamed or macro) namespace, then you will have a collision. This is possible in translation-unit code, where one may include one header or another, but not both, but does not work well in the new environment.
I am not too familiar with precompiled headers, but I understand that some implementations have restrictions on their use which diminish their usefulness. In particular, on some systems, precompiled headers replace a common prefix of a set of necessary includes. This helps, but tends to result in wasteful inclusion when the code base is ported to a compiler that does not support precompiled headers.
It would be easier to use C++ as you describe if there were some facility beyone "#include", specifically, a module import. The concept would be that #import foo.h would add all definitions from foo.h to the compiler symbol table, but foo.h itself would be parsed as if it were the very first code encountered. Every other aspect of the language could remain the same, and if careful style rules are followed, code developed under such a scheme could be portable C++ (make sure all headers have include guards and are "self contained"; severely restrict preprocessor tricks, then just replace #import with #include). Since we lack such a rule, most implementations of precompiled headers for C++ require that the precompiled header come first (and typically this is a header that defines everything, which sucks if the same code is used by a more traditional compiler).
Apparently the language standard is already designed in such a way that an implementer could use a module-import approach to implement the standard library (e.g. just directly add certain symbols to the symbol table when the user writes #include <vector>, instead of parsing code).
If you aren't thinking in terms of a new language that would be compiled/transformed into C++, then I think you cannot achieve what you want from class access modifiers. C++'s class access modifiers just won't support what you describe.
However, the dependency tracking you want should be doable with a database (or similar collection of metadata). The development environment would need a full-up C++ parser, for starters. The possibilities for assisting with refactoring are definitely interesting.
I don't like C++. However, I can see that the ideas you are wanting to put into C++ would remove the remaining benefits it _does_ have and you'd simply wind up with something that wasn't any more capable than Java, but had all of the nastiness that comes with C++.
You see, one of the reasons for this interconnected, tangled mess is that with C++, the compiler can perform MANY optimizations that otherwise just wouldn't be possible. Things like inlined functions which are specially compiled when possible. For example, if I had this (forgive my syntax):
/* in the header file */ class foo {
public inline dosomething(int i) { for(b = 1; b < i; b++) { /* do stuff here */ } }
};
/* in another file */ foo a; a.dosomething(3);
With this, the compiler can generate a special, ultrafast version of dosomething(i) which is optimized for the constant 3. In fact, you wouldn't even need the variable, you could simply unroll the loop, remove the function call, and stick it in inline. This is simply not possible with Java and other languages which "import-based" definitions. There are many other implications, but they all center around this same concept - C++ allows the compiler to do optimizations which need as much code available to it as possible. If you didn't want the speed anyway, you probably should've been using a different language all along.
First, C++ has many advantages over Java besides speed. For one, meta-programming. For two, it's capable of doing more than just OO programming, which is good, because many projects don't fit an OO model at all.
Back to speed, tho, inling is just as possible with an "import" system as otherwise. The trick is, the compiled object files would store both machine-compiled code, and extra "ready to inline" metadata; i.e., parse trees with some source-level optimization done. When you import a module, and use a method/function, the compiler can figure out if the inlined version is usable, and if so, combine it.
The difference is, unlike Java, this will still require object files to be linked into a real final executable. Java compiles a module once, and that output is exactly what is used during runtime - there is no intermediary.
"The trick is, the compiled object files would store both machine-compiled code, and extra "ready to inline" metadata;"
This is just precompiled headers, then. You don't get the advantages of a Java-style module system, because if the implementation changes from release 0.1 to 0.2, you have to recompile against the new version.
You'd only need to recompile if the inlined functions change. Which is just a fact of life - if you don't want to need recompiles, don't use inline functions.
I find that most inline functions I write are very small and simple, and don't ever really change. If the build system can detect that the functions/methods haven't changed, it could skip rebuilding dependeny modules.
Which probably brings up the real pain in C/C++ development - most build tools are "dumb," and only work on timestamps versus actual dependencies (ABI).
This can't be done with an include style solution, since the includes would have to be parsed (a good chunk of the work of a recompile) to detect if ABI has changed; using modules, the ABI fingerprints can be reused for each dependent module.
Stevey wrote:
One of the biggest gains in speed I've seen when I've been rebuilding Linux kernels and large applications like Mozilla comes from caching compiler output.
Caching of build results is one of the features of Vesta.
Didn't Visual Age for C++ work exactly like what you describe?
Well, if it does I can't find any mention of these features on IBM's web site. They describe the product's standards conformance, but do not describe the IDE at all.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!