31 Dec 2002 johnnyb   » (Journeyer)

Just wanted to post a recent email exchange I had with an Italian professor over my XML Literate Programming system, because I think I had some good diary material in there:) --

Dear Sir,

First of all, let me make sure you understand the idea behind my project, and literate programming in general.

Literate programming (as you may know) is about programming in such a way that instead of writing a program to a machine, you are instead writing a book or paper to a human about writing a program to a machine. This makes software development easier in the long run for the following reasons:

  • The formal documentation _is_ the program and therefore does not need to be separately maintained
  • The code is organized in a fashion to be understood by a human, allowing easier transition of new maintainers to the project
  • The code is more correct, as each section is described in human form to others, and it is easier for others to see where the description and the code do not agree
  • These factors should lead to better programs, and easier transitions between code maintainers, and less overall cost for the long-term.

Now, about my project -

Donald Knuth in his book "Literate Programming" described a system he wrote for doing literate programming called CWEB. Although it was a nice system, I saw shortcomings:

  • It required that you learn TeX, which is all but unused today and is _very_ tricky to get right
  • The system was geared toward a specific programming language and a specific set of TeX macros.

XML Tangle solves these problems by using a standard tag language (XML), and also being both programming language and DTD-agnostic.

Early versions of XML Tangle required a specific tagset. The current version, available at http://literatexml.sourceforge.net/ uses Processing Instructions, so you can use any tag set you want, including DocBook, HTML, TEI, etc. In fact, I've heard that future versions of Microsoft Word may be XML-based, so it may be possible to even use their format (although, if they do not allow you to directly modify the processing instructions, you probably will not be able to use the Microsoft Word _application_ even if you can use the Microsoft Word _format_/DTD).

Also note that this program takes an extra patch that I haven't committed yet to run under Python 2. Python 1.5 (or whatever ships with Red Hat 7.3) works fine with it.

Now to your specific questions:

I' m interested in thinking to that subject.. So I ask you if you have extended your ideas and particularly if you think that TDT has to conserved or abandoned ..

My main wondering is whether or not this whole thing can be accomplished using the SGML Architectural Engine instead of my specialized program. As I've already made it DTD-agnostic, I don't need to conserve or abandon it :) Although I do think my selection of processing instructions is adequate at least for now.

My main gripe about the system is that there are no easy-to-use tools for programming with it. As I've started using it for a book I'm writing, I've found typing all of the processing instructions tedious, and a good XML GUI or emacs mode for doing this would be great, if I ever find the time to write one.

I also think that I should add language-specific hooks, especially for formatting and indexing, but I don't have any concrete ideas about these yet.

One thing that is definitely deficient in this system is the ability to move from a non-literate program to a literate one. One of the greatest things about the Perl programming language is that you can start writing a program as a quick hack, and then gradually morph it into a beautifully structured object-oriented program. Perl helps you move through the stages of program development. Most programs you do not know the structure when you sit down to write it, so you crank out something mostly procedural. Then, as you develop it, the objects, systems, and subsystems begin to be clear. Perl makes it easy to refactor your procedural code into objects and classes, and therefore helps you move your program from the mess it started out as to a beautiful structure. My literate system has no similar ability to help people go from an "illiterate" program to a "literate" program. It is only really useful for programs which start out in a literate state. If you have any suggestions on how I can fix that, they would be greatly appreciated!

I think I've answered that one. Let me know if I left anything out.

Now with your next question:

Moreover I ask you if "the fact that EXCEL and ACCES accept quite well XML documents " could give rise to a different approach to programming: The case of EXCEL is quite eloquent: using macros could transform a non programmer in a programmer..for limited pourposes (the majorit.. in practice..)

This question deserves quite an extended answer, as there are several questions hidden in there.

First of all, accepting XML documents is not the issue. In fact, using XML to program with isn't the issue either. XML is currently becoming overly used. XML is useful for the following reasons -

For the XML author -

  • It's easy to write - the syntax is simple with few, regularized exceptions. In addition, everyone already knows HTML
  • Tags are (sometimes) flexible, depending on the application
  • Trees match how people normally think

For the programmer -

  • Don't have to re-invent the wheel when developing a file format
  • Already have half a parser written
  • Tree structures are easy to manipulate (although they are not as flexible as relational structures)
  • File format problems are easier to debug because of the plain-text nature

XML is essentially the new syntax for S-expressions.

Now on to your question -

The ability to accept an XML document does not really mean much. I am not aware of the specifics of those two programs, but it really isn't that helpful if it only allows it with a rigid, complicated DTD. It's just a different file format, which really doesn't benefit the end-user that much - it just simplified Microsoft's process of writing a parser.

In addition, it does not solve the problems solved by Literate Programming - namely the ability to communicate programs clearly to other human beings. Excel macros are especially difficult in this regard, because each formula has to be inspected individually - and some fields may not even be apparent that a formula exists. Documentation within a spreadsheet is especially difficult.

Is Excel a good tool? Sure, it's excellent for simple things. It's especially excellent for non-programmers who want to test out new ideas and formulas without hiring a programmer. However, it does not lend itself to scale very well. I would consider it a prototyping tool which is useful for learning what you want, and useful for production for fewer than 5 users. Past that it requires a disciplined programmer to manage and administer so that it is usable in multiple environments, interacts well with other programs, is properly tested and robust, and is well-documented. No large operation should base itself on undocumented code.

Many of the same problems are applicable to Access as well. Personally, I do not like Access because for the things that most novices can use it for, they can do it easier on a spreadsheet. Then, for tasks of medium or larger complexity, it is really the wrong tool. A true relational database should be used, with its architecture tied to the data sources available within the rest of the organization. One of the biggest problems I've found within the industry are individuals who create Access databases thinking that they have created a "real" database and program, and believe that it should only take their IT personnel a few minutes to post as an application for all users. The IT person then finds that their fields are nothing but varchars, there are no constraints, and it is full of bugs, and is a real mess. Add to that the scalability limits of Access, and you can see how Access can be problematic.

In every case I've seen, if Excel is not enough for that user's application, they need a true programmer to develop it.

Although I believe that special-purpose tools which allow non-programmers the benefits of a small programming environment are excellent, I also think that a larger problem today is the dumbing down of the current generation of programmers. I have met far two many people

  • who call themselves database programmers without knowing any relational theory
  • who call themselves computer programmers but do not understand anything about how computer languages, the operating system, and the hardware interact. They can only program in Visual Basic, and get totally lost and confounded for weeks if they hit any irregular issue.
  • who call themselves system administrators but only really know how to download and install patches

The list goes on and on. Current programmers also have no skills in speaking, listening, writing, or documentation, which means that whenever new programmers come in, they either have a huge learning curve before they are useful, or they just have to rewrite large pieces from scratch. The inability to communicate is affecting program quality to a large degree, becuase programmers cannot adequately understand, listen, and talk about the needs and desires of their users in order to make an application that works well for them.

Finally, programmers today are losing their background in technical areas, too, and the quality of their tools are suffering. For example, programmers do not know relational database theory, so database vendors are producing tools that are further and further from the relational model, rather than closer to it. Vendors are producing programming languages like C++ and Visual Basic which are more hindrances to programming than helps - leaving out fundamental language features like closures that have been known about for decades.

Anyway, sorry for my rant, but it happens :)

Let me know if you have any other questions. If you need help learning my Literate Programming toolchain I'd be happy to help. Also, if you are interested, I'm writing a book on programming called "Programming from the Ground Up". The current draft is available from

http://savannah.nongnu.org/files/?group=pgubook

I'm also writing a second book which is about programming languages (it's essentially a Scheme interpretter written in assembly language in a literate style - I chose assembly language so that the reader could see how the low-level constructs combine to create high-level constructs). The current draft of that is available from my hard drive :)

Let me know if you have any questions!

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!