Older blog entries for johnnyb (starting at number 49)

Just wanted to post a recent email exchange I had with an Italian professor over my XML Literate Programming system, because I think I had some good diary material in there:) --

Dear Sir,

First of all, let me make sure you understand the idea behind my project, and literate programming in general.

Literate programming (as you may know) is about programming in such a way that instead of writing a program to a machine, you are instead writing a book or paper to a human about writing a program to a machine. This makes software development easier in the long run for the following reasons:

  • The formal documentation _is_ the program and therefore does not need to be separately maintained
  • The code is organized in a fashion to be understood by a human, allowing easier transition of new maintainers to the project
  • The code is more correct, as each section is described in human form to others, and it is easier for others to see where the description and the code do not agree
  • These factors should lead to better programs, and easier transitions between code maintainers, and less overall cost for the long-term.

Now, about my project -

Donald Knuth in his book "Literate Programming" described a system he wrote for doing literate programming called CWEB. Although it was a nice system, I saw shortcomings:

  • It required that you learn TeX, which is all but unused today and is _very_ tricky to get right
  • The system was geared toward a specific programming language and a specific set of TeX macros.

XML Tangle solves these problems by using a standard tag language (XML), and also being both programming language and DTD-agnostic.

Early versions of XML Tangle required a specific tagset. The current version, available at http://literatexml.sourceforge.net/ uses Processing Instructions, so you can use any tag set you want, including DocBook, HTML, TEI, etc. In fact, I've heard that future versions of Microsoft Word may be XML-based, so it may be possible to even use their format (although, if they do not allow you to directly modify the processing instructions, you probably will not be able to use the Microsoft Word _application_ even if you can use the Microsoft Word _format_/DTD).

Also note that this program takes an extra patch that I haven't committed yet to run under Python 2. Python 1.5 (or whatever ships with Red Hat 7.3) works fine with it.

Now to your specific questions:

I' m interested in thinking to that subject.. So I ask you if you have extended your ideas and particularly if you think that TDT has to conserved or abandoned ..

My main wondering is whether or not this whole thing can be accomplished using the SGML Architectural Engine instead of my specialized program. As I've already made it DTD-agnostic, I don't need to conserve or abandon it :) Although I do think my selection of processing instructions is adequate at least for now.

My main gripe about the system is that there are no easy-to-use tools for programming with it. As I've started using it for a book I'm writing, I've found typing all of the processing instructions tedious, and a good XML GUI or emacs mode for doing this would be great, if I ever find the time to write one.

I also think that I should add language-specific hooks, especially for formatting and indexing, but I don't have any concrete ideas about these yet.

One thing that is definitely deficient in this system is the ability to move from a non-literate program to a literate one. One of the greatest things about the Perl programming language is that you can start writing a program as a quick hack, and then gradually morph it into a beautifully structured object-oriented program. Perl helps you move through the stages of program development. Most programs you do not know the structure when you sit down to write it, so you crank out something mostly procedural. Then, as you develop it, the objects, systems, and subsystems begin to be clear. Perl makes it easy to refactor your procedural code into objects and classes, and therefore helps you move your program from the mess it started out as to a beautiful structure. My literate system has no similar ability to help people go from an "illiterate" program to a "literate" program. It is only really useful for programs which start out in a literate state. If you have any suggestions on how I can fix that, they would be greatly appreciated!

I think I've answered that one. Let me know if I left anything out.

Now with your next question:

Moreover I ask you if "the fact that EXCEL and ACCES accept quite well XML documents " could give rise to a different approach to programming: The case of EXCEL is quite eloquent: using macros could transform a non programmer in a programmer..for limited pourposes (the majorit.. in practice..)

This question deserves quite an extended answer, as there are several questions hidden in there.

First of all, accepting XML documents is not the issue. In fact, using XML to program with isn't the issue either. XML is currently becoming overly used. XML is useful for the following reasons -

For the XML author -

  • It's easy to write - the syntax is simple with few, regularized exceptions. In addition, everyone already knows HTML
  • Tags are (sometimes) flexible, depending on the application
  • Trees match how people normally think

For the programmer -

  • Don't have to re-invent the wheel when developing a file format
  • Already have half a parser written
  • Tree structures are easy to manipulate (although they are not as flexible as relational structures)
  • File format problems are easier to debug because of the plain-text nature

XML is essentially the new syntax for S-expressions.

Now on to your question -

The ability to accept an XML document does not really mean much. I am not aware of the specifics of those two programs, but it really isn't that helpful if it only allows it with a rigid, complicated DTD. It's just a different file format, which really doesn't benefit the end-user that much - it just simplified Microsoft's process of writing a parser.

In addition, it does not solve the problems solved by Literate Programming - namely the ability to communicate programs clearly to other human beings. Excel macros are especially difficult in this regard, because each formula has to be inspected individually - and some fields may not even be apparent that a formula exists. Documentation within a spreadsheet is especially difficult.

Is Excel a good tool? Sure, it's excellent for simple things. It's especially excellent for non-programmers who want to test out new ideas and formulas without hiring a programmer. However, it does not lend itself to scale very well. I would consider it a prototyping tool which is useful for learning what you want, and useful for production for fewer than 5 users. Past that it requires a disciplined programmer to manage and administer so that it is usable in multiple environments, interacts well with other programs, is properly tested and robust, and is well-documented. No large operation should base itself on undocumented code.

Many of the same problems are applicable to Access as well. Personally, I do not like Access because for the things that most novices can use it for, they can do it easier on a spreadsheet. Then, for tasks of medium or larger complexity, it is really the wrong tool. A true relational database should be used, with its architecture tied to the data sources available within the rest of the organization. One of the biggest problems I've found within the industry are individuals who create Access databases thinking that they have created a "real" database and program, and believe that it should only take their IT personnel a few minutes to post as an application for all users. The IT person then finds that their fields are nothing but varchars, there are no constraints, and it is full of bugs, and is a real mess. Add to that the scalability limits of Access, and you can see how Access can be problematic.

In every case I've seen, if Excel is not enough for that user's application, they need a true programmer to develop it.

Although I believe that special-purpose tools which allow non-programmers the benefits of a small programming environment are excellent, I also think that a larger problem today is the dumbing down of the current generation of programmers. I have met far two many people

  • who call themselves database programmers without knowing any relational theory
  • who call themselves computer programmers but do not understand anything about how computer languages, the operating system, and the hardware interact. They can only program in Visual Basic, and get totally lost and confounded for weeks if they hit any irregular issue.
  • who call themselves system administrators but only really know how to download and install patches

The list goes on and on. Current programmers also have no skills in speaking, listening, writing, or documentation, which means that whenever new programmers come in, they either have a huge learning curve before they are useful, or they just have to rewrite large pieces from scratch. The inability to communicate is affecting program quality to a large degree, becuase programmers cannot adequately understand, listen, and talk about the needs and desires of their users in order to make an application that works well for them.

Finally, programmers today are losing their background in technical areas, too, and the quality of their tools are suffering. For example, programmers do not know relational database theory, so database vendors are producing tools that are further and further from the relational model, rather than closer to it. Vendors are producing programming languages like C++ and Visual Basic which are more hindrances to programming than helps - leaving out fundamental language features like closures that have been known about for decades.

Anyway, sorry for my rant, but it happens :)

Let me know if you have any other questions. If you need help learning my Literate Programming toolchain I'd be happy to help. Also, if you are interested, I'm writing a book on programming called "Programming from the Ground Up". The current draft is available from

http://savannah.nongnu.org/files/?group=pgubook

I'm also writing a second book which is about programming languages (it's essentially a Scheme interpretter written in assembly language in a literate style - I chose assembly language so that the reader could see how the low-level constructs combine to create high-level constructs). The current draft of that is available from my hard drive :)

Let me know if you have any questions!

14 Oct 2002 (updated 19 Dec 2002 at 17:36 UTC) »

It's been a long time since I posted here, especially since I posted something that was simply a repost of something I said in a different board.

Anyway, that's because my life has been quite busy. I finally got the job I desired (it's a Linux-mostly job, thought not a Free Software one - but I may be able to change that over the long run). I'm working to start the IT Services branch of a new company.

Some Sleight-of-Hand

Ballmer is pretty tricky with words. You have to read them carefully and know the situation to see the lies. For example, take this paragraph:

"We do not anticipate offering software on Linux. Nobody pays for software on Linux. Even StarOffice, sold by Sun, was originally a free product. And IBM, arguably the No. 1 player in the Linux market, promotes Linux to big users, but does not actually sell Linux"

Notice he starts by talking about selling software on Linux, but the points he makes about IBM are about Linux itself, not selling software on Linux. In fact, every counterexample he used is actually an instance of people selling software on Linux. However, in the Ballmer reality distortion field, this somehow proves his point. It is true, IBM does not sell Linux. The do sell MANY APPLICATION that run on Linux, which was what the paragraph was presumably about. As is usual, Microsoft is trying to scare people out of providing applications on Linux.

The problem with anything that Microsoft says is that much of it is either deception or outright lies. For example, they point out Linux's system requirements as too large for embedded work here http://www.microsoft.com/windows/Embedded/xp/evaluation/compare/notlinux.asp yet their own product requires an even larger footprint.

Anyway, it's true Microsoft needs to come up with a new way of providing value, because the programming community at large has largely obsoleted most of their technology. Open-source isn't about taking on Microsoft or anything like that - it's about professionals working together to further their profession. When the entire worldwide professional computer community works together, there's nothing that other companies can really do to stand in their way - they either need to join or become obsolete.

On my Linux box, I play games, create 3D animations, write music, develop applications, manage my personal finances, write email, create presentations and so on. Why would I (or anyone else) need Microsoft's help for $700 per computer (OS + Office)?

While more and more people start realizing that the software industry at large has been massively overcharging them, the market will flood toward companies who a) are part of the global development community, and b) leverage their knowledge from the community to provide products that provide much more value than they cost. That is the open-source way.

The following is my response to the article "is a circle an ellipse" at dbdebunkings.com:

First of all, let me say that I love your site. I am a much better-informed person for reading it.

Secondly, let me say that you, like most people even in the OO community, have missed what inheritance is really about. As you know, many people in the database community have missed what databases are all about, so it's shouldn't come as a surprise to you that other computer fields have the same problems.

The article focused on inheritance as a type/subtype relationship. There are two ways to define type/subtype relationships - by restriction or generalization. Neither of these are useful for object-oriented programming.

Object-oriented programming is defined by one thing - interfaces. The ability for an object to "do" something. In your circle/ellipse example, you did not list any operations for them to "do", so it is impossible to tell what kind of relationship they would have in the object-oriented programming world. For example, if they both implemented a function called "enlarge", then they would both implement the interface Enlargeable. In fact, squares and pentagons could be part of this category as well. With this in place, an application can enlarge an object independent of what type it is, or what data it carries.

My philosophy teacher always asked "how do you define a chair"? He gave possible examples - a flat board with four legs and a back. However, this did not work for three-legged chairs. Any definition along these lines would fail for some chair invented somewhere. The bean-bag chairs really threw people. However, this is because it needed an operational definition - a chair is something that you sit on. Or, in computer terms, implements the SitOn interface. Object-oriented programming gives operational definitions, and thus makes itself a very powerful tool in programming.

Interestingly, this makes object-oriented programming completely orthogonal to databases, since one deals with the operational characteristics and the other deals with the data and data dependencies of a system.

Anyway, I wrote a short paper on this in college - you can read it at

http://www.eskimo.com/~johnnyb/computers/FactoringInheritance/

23 Jul 2002 (updated 23 Jul 2002 at 03:07 UTC) »

Just finished a new release of xmltangle - doing literate programming in XML. It differs from previous attempts at this as it uses processing instructions for handling the data instead of tags and attributes. I do plan on creating a few more processing instructions to map elements and attributes to the Literate functions, kind of like an architectural engine. You can find the code at

http://literatexml.sourceforge.net/

My response to a question on a Yahoo newsgroup:

Do YOU see Linux making a big impact on the desktop?

**********

I certainly do. Microsoft relies entirely on the OEMs to survive. In addition, Microsoft's licensing has totally prevented OEMs from differentiating themselves in any way other than price - making their margins horrible. OEMs want to get out and Linux is the best way. The problem is who will be the first to jump. Wal-Mart may well set the standard here and cause a windfall. Remember - Wal-Mart can afford to sell cheaper than ANY other.

On the business front, the value proposition of Linux on the desktop is real and credible. In fact, CIOs are going to be called to account for the places they AREN'T using open software very soon. The only real problem is that people underestimated the time scale required for this change to happen.

However, the first place the desktop revolution is going to happen is schools. This is happening right now all over the country. The early-adopters are starting all over the country. Microsoft has made the donated PCs running Windows essentially illegal - the only real option for donated PCs under Microsoft's new rules is running Linux. Using Linux in a terminal environment, you can do a computer lab with REALLY OLD donated PCs for $6,000 including server, cabling, etc that runs OpenOffice, Mozilla, and all the other wonderful open-source programs.

With Wine's derivatives being as good as they are now, schools can even use their existing software they have purchased.

I think of this in terms of chemistry - the potential energy here is magnanimus, however, it takes a lot of extra energy to start the reaction, unless there is a catalyst of some sort. It's just a simple amount of time before Linux has the buildup it needs to start the reaction.

Reminder to myself to rant about XML styling languages and the current state of XML publishing.

My thoughts on a recent newsforge article:

To the problem of Linux being hard to install, I say phooey. Linux and Windows both have trouble installing, just on different systems. When it does install, Linux is usually easier.

To the problem of third-party package management - that is actually a real problem. There are several reasons and solutions:

1) The dists have done a good job of kitchen-sinking their distribution. Most of what anyone needs is already in the distribution, thus leaving little reason for people writing the applications to make it easy-to-install.

2) package formats cannot contain dependencies. Unfortunately, the current package formats do not have support (that I'm aware of) for shipping all of their dependencies. This can be solved by shell scripts, but because of #1 most people don't bother. However, maybe a universal install script might be a good project for someone. Have a script for collecting dependencies into a .shar file, and extend the .shar file to actually run the installer, which will install whatever dependencies are needed.

3) what is the operating system? The blurring of lines between operating system and add-on is great in Linux. Is Red Hat Linux it's own operating system? Is GNOME it's own operating system? How should someone communicate what operating system a package runs under? I think we need to stop viewing Linux as an operating system, and start viewing the distributions as operating systems. The distributions define their compaitibility requirements, and therefore should be what developers aim their packages for, since it is a stable, known quantity. I know you'll say LSB, but I don't think that the LSB is good for Linux.

Remember how Windows does this - they basically stick a new version of Windows with Office. Have you seen how many .dll's it updates? They basically replace the entire Windows infrastructure when you install Office. Therefore, I think the best way is for developers to ship the dependencies with the programs. Maybe they should also statically link more of their libraries. Maybe the libraries/toolchains should make it easier to choose which libraries are statically/dynamically compiled.

Creating a new section of my web site for a potential consulting business. I'm kind of wavering between staying where I am (EDS), going out on my own, and going with my friend Chris. There's a lot of potential I could do with Chris, but I know he has no clue about free software. Using free software wouldn't be a problem for certain projects, but writing free software I'm sure would be a problem. However, it might be doable over a period of time.

Doing my own consulting business might be good on the side of my EDS job, especially since I work 2nd shift. I'm pretty free until 3, unless you count my family :)

As of right now, I can't do consulting by itself simply because my son requires good insurance (lots of health problems).

Anyway, my site for the theoretical consulting biz is at

http://www.eskimo.com/~johnnyb/BartsonsConsulting/

The "solving problems" link has a semi-manifesto on which I plan on basing the business around.

Anyway, I have to wait until my wife pops out child #2 anyway (should be any day now), so I can kind of stick it in the back of my mind until life normalizes a little bit :)

Anyway, not sure what I want to do. In addition, I want to be spending more time in the church, but I don't see that happening to a great extent anytime soon.

This is my response to a post on NewsForge about whether or not programmers will get less money with the GPL:

Remember. Your can fulfill ALL of your obligations to the GPL by simply providing them with source on the CD that is covered by the GPL. What this means:

* for consulting work, you get paid the same.

* if an industry group wants you to write an industry-specific application, you get paid the same.

* if you are selling boxed software, you basically will still get paid the same. Why?

a) noone will be able to download it until after people start buying it in the stores, since that's where the source code is.

b) the value of the software in the store is greater than the value of it in cyberspace. Why? Because of the salesperson who can help the user find the software they need.

c) people respect brand names. It's human nature. Build a good brand and your brand will sell at higher prices than the same product from other brands.

d) only offer support on the purchased product.

Doing so will also reduce your R&D time because you will have users who will send you patches for things they want, if they are skilled an interested enough.

Now, if you are overcharging for your product anyway, sure, the market will correct you. However, I wouldn't be proud to say I made a living extorting customers because they didn't have any alternatives.

Users want freedom. They are even willing to pay for it. Remember, a lot of new computer sales were made to young people so they could get a piece of the "free" music. They paid thousands of dollars to get "free" music. Think about that. The money isn't the problem - the freedom is.

Add to this that very, very few people in the U.S. consider piracy to be unethical, and you remove any reason why the GPL would reduce sales on products offered at a reasonable price.

40 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!