Just wanted to post a recent email exchange I had with an Italian professor over my XML Literate Programming system, because I think I had some good diary material in there:) --
Dear Sir,
First of all, let me make sure you understand the idea behind my project,
and literate programming in general.
Literate programming (as you may know) is about programming in such a way
that instead of writing a program to a machine, you are instead writing a
book or paper to a human about writing a program to a machine. This makes
software development easier in the long run for the following reasons:
- The formal documentation _is_ the program and therefore does not need to
be separately maintained
- The code is organized in a fashion to be understood by a human, allowing
easier transition of new maintainers to the project
- The code is more correct, as each section is described in human form to
others, and it is easier for others to see where the description and the
code do not agree
- These factors should lead to better programs, and easier transitions
between code maintainers, and less overall cost for the long-term.
Now, about my project -
Donald Knuth in his book "Literate Programming" described a system he
wrote for doing literate programming called CWEB. Although it was a nice
system, I saw shortcomings:
- It required that you learn TeX, which is all but unused today and is
_very_ tricky to get right
- The system was geared toward a specific programming language and a
specific set of TeX macros.
XML Tangle solves these problems by using a standard tag language (XML),
and also being both programming language and DTD-agnostic.
Early versions of XML Tangle required a specific tagset. The current
version, available at http://literatexml.sourceforge.net/ uses Processing
Instructions, so you can use any tag set you want, including DocBook,
HTML, TEI, etc. In fact, I've heard that future versions of Microsoft
Word may be XML-based, so it may be possible to even use their format
(although, if they do not allow you to directly modify the processing
instructions, you probably will not be able to use the Microsoft Word
_application_ even if you can use the Microsoft Word _format_/DTD).
Also note that this program takes an extra patch that I haven't committed
yet to run under Python 2. Python 1.5 (or whatever ships with Red Hat
7.3) works fine with it.
Now to your specific questions:
I' m interested in thinking to that subject.. So I ask you if you have
extended your ideas and particularly if you think that TDT has to
conserved or abandoned ..
My main wondering is whether or not this whole thing can be accomplished
using the SGML Architectural Engine instead of my specialized program. As
I've already made it DTD-agnostic, I don't need to conserve or abandon it
:) Although I do think my selection of processing instructions is
adequate at least for now.
My main gripe about the system is that there are no easy-to-use tools for
programming with it. As I've started using it for a book I'm writing,
I've found typing all of the processing instructions tedious, and a good
XML GUI or emacs mode for doing this would be great, if I ever find the
time to write one.
I also think that I should add language-specific hooks, especially for
formatting and indexing, but I don't have any concrete ideas about these
yet.
One thing that is definitely deficient in this system is the ability to
move from a non-literate program to a literate one. One of the greatest
things about the Perl programming language is that you can start writing a
program as a quick hack, and then gradually morph it into a beautifully
structured object-oriented program. Perl helps you move through the
stages of program development. Most programs you do not know the
structure when you sit down to write it, so you crank out something mostly
procedural. Then, as you develop it, the objects, systems, and subsystems
begin to be clear. Perl makes it easy to refactor your procedural code
into objects and classes, and therefore helps you move your program from
the mess it started out as to a beautiful structure. My literate system
has no similar ability to help people go from an "illiterate" program to a
"literate" program. It is only really useful for programs which start out
in a literate state. If you have any suggestions on how I can fix that,
they would be greatly appreciated!
I think I've answered that one. Let me know if I left anything out.
Now with your next question:
Moreover I ask you if "the fact that EXCEL and ACCES accept quite well
XML documents " could give rise to a different approach to programming:
The case of EXCEL is quite eloquent: using macros could transform a non
programmer in a programmer..for limited pourposes (the majorit.. in
practice..)
This question deserves quite an extended answer, as there are several
questions hidden in there.
First of all, accepting XML documents is not the issue. In fact, using
XML to program with isn't the issue either. XML is currently becoming
overly used. XML is useful for the following reasons -
For the XML author -
- It's easy to write - the syntax is simple with few, regularized
exceptions. In addition, everyone already knows HTML
- Tags are (sometimes) flexible, depending on the application
- Trees match how people normally think
For the programmer -
- Don't have to re-invent the wheel when developing a file format
- Already have half a parser written
- Tree structures are easy to manipulate (although they are not as
flexible as relational structures)
- File format problems are easier to debug because of the plain-text
nature
XML is essentially the new syntax for S-expressions.
Now on to your question -
The ability to accept an XML document does not really mean much. I am not
aware of the specifics of those two programs, but it really isn't that
helpful if it only allows it with a rigid, complicated DTD. It's just a
different file format, which really doesn't benefit the end-user that much
- it just simplified Microsoft's process of writing a parser.
In addition, it does not solve the problems solved by Literate
Programming - namely the ability to communicate programs clearly to other
human beings. Excel macros are especially difficult in this regard,
because each formula has to be inspected individually - and some fields
may not even be apparent that a formula exists. Documentation within a
spreadsheet is especially difficult.
Is Excel a good tool? Sure, it's excellent for simple things. It's
especially excellent for non-programmers who want to test out new ideas
and formulas without hiring a programmer. However, it does not lend
itself to scale very well. I would consider it a prototyping tool which
is useful for learning what you want, and useful for production for fewer
than 5 users. Past that it requires a disciplined programmer to manage
and administer so that it is usable in multiple environments, interacts
well with other programs, is properly tested and robust, and is
well-documented. No large operation should base itself on undocumented
code.
Many of the same problems are applicable to Access as well. Personally, I
do not like Access because for the things that most novices can use it
for, they can do it easier on a spreadsheet. Then, for tasks of medium or
larger complexity, it is really the wrong tool. A true relational
database should be used, with its architecture tied to the data sources
available within the rest of the organization. One of the biggest
problems I've found within the industry are individuals who create Access
databases thinking that they have created a "real" database and program,
and believe that it should only take their IT personnel a few minutes to
post as an application for all users. The IT person then finds that their
fields are nothing but varchars, there are no constraints, and it is full
of bugs, and is a real mess. Add to that the scalability limits of
Access, and you can see how Access can be problematic.
In every case I've seen, if Excel is not enough for that user's
application, they need a true programmer to develop it.
Although I believe that special-purpose tools which allow non-programmers
the benefits of a small programming environment are excellent, I also
think that a larger problem today is the dumbing down of the current
generation of programmers. I have met far two many people
- who call themselves database programmers without knowing any relational
theory
- who call themselves computer programmers but do not understand anything
about how computer languages, the operating system, and the hardware
interact. They can only program in Visual Basic, and get totally lost and
confounded for weeks if they hit any irregular issue.
- who call themselves system administrators but only really know how to
download and install patches
The list goes on and on. Current programmers also have no skills in
speaking, listening, writing, or documentation, which means that whenever
new programmers come in, they either have a huge learning curve before
they are useful, or they just have to rewrite large pieces from scratch.
The inability to communicate is affecting program quality to a large
degree, becuase programmers cannot adequately understand, listen, and talk
about the needs and desires of their users in order to make an application
that works well for them.
Finally, programmers today are losing their background in technical areas,
too, and the quality of their tools are suffering. For example,
programmers do not know relational database theory, so database vendors
are producing tools that are further and further from the relational
model, rather than closer to it. Vendors are producing programming
languages like C++ and Visual Basic which are more hindrances to
programming than helps - leaving out fundamental language features like
closures that have been known about for decades.
Anyway, sorry for my rant, but it happens :)
Let me know if you have any other questions. If you need help learning my
Literate Programming toolchain I'd be happy to help. Also, if you are
interested, I'm writing a book on programming called "Programming from the
Ground Up". The current draft is available from
http://savannah.nongnu.org/files/?group=pgubook
I'm also writing a second book which is about programming languages (it's
essentially a Scheme interpretter written in assembly language in a
literate style - I chose assembly language so that the reader could see
how the low-level constructs combine to create high-level constructs).
The current draft of that is available from my hard drive :)
Let me know if you have any questions!