Older blog entries for simos (starting at number 4)

Docbook XML and creating pritable documents (like PDF).

Is that an interesting topic? Well, it sure is. I'll go in details, in layman terms, so it's approachable.

XML is a versatile markup language that you can use to represent almost any information. You typically enclose pieces of data in tags, such as with <name>Simos</name>. These tags are custom and signify what is that they contain. Therefore, XML is so versatile that you need to have a so-called "schema" or a description of the available tags for the type of document you want to represent.

There is a standardisation process of schemata (plural of schema) for different domains at xml.org and specifically at their registry page.

XML is used in open-source software in many places and the most common use is that of the documentation. Here, DocBook XML is used. For example, see The Linux Documentation Project (TLDP) which has standardised to DocBook XML (if you remember it used to be LinuxDoc some years back).

Suppose you have a document written in DocBook XML. With tools you can convert it to other presentation formats such as plaintext, HTML (+variants), PostScript, PDF and so on.

For the first two the process is quite easy as the tags are either stripped (plaintext) or converted to other tags (HTML). Your text editor or Web browser can be used to represent these, and they do a good job representing Unicode characters as well.

For PostScript or PDF the story is a bit different. It works relatively well with latin-based scripts. For example, see Docbook bits which shows how to setup your system with Fedora Core 2. No need for compilation, simply install the available RPM packages. For non-latin languages it's not so easy.

To convert from DocBook XML to PDF you need two programs; one that will take your DocBook XML source file and apply a stylesheet, producing a Format Objects (FO) intermediate file that contains both content and presentation information, and another that takes the FO file and converts to PDF.

The first program is an XSLT engine and the second an FO engine. There are several such engines for both programs, listed at XSL Engines. We mentioned Docbook bits above; it uses xsltproc to convert DocBook XML to FO and then passivetex to convert FO to PDF (or PostScript). Another combination is to use Xalan and FOP (example). A third option is xmlroff that can do both jobs; start from XML source and stylesheet and produce PDF. xmlroff is interesting because it uses Pango (yeah!) to render fonts (example with sample text in greek, russian, arabic and tamil).

To sum up, what the community would need is a way to create quality PDF and PostScript files from DocBook XML for any language (assuming there is a font), this process is easy to follow (like Docbook bits) and distributions have the necessary tools available as packages (RPM, DEB, etc).

10 Aug 2004 (updated 10 Aug 2004 at 23:14 UTC) »

Quite a few interesting articles today.

To start with, Linux and Patent Risks talks about the 283 patents that the Linux kernel might affect. The study comes from Dan Ravicher, a patent attorney that apart from his main job, he represents the Free Software Foundation pro-bono. The patent system can be abused very badly and Dan is working towards its reform. People should know how it works as ignorance pays. Read the article!

Reading daily http://www.Groklaw.net helps understand what's going on with the world's favourite litigation chu-chu train. Also, there are articles on other legal aspects and open-source software. Behind the amazing work of Groklaw is Pamela Jones, a journalist with background in paralegal matters.

Martin Taylor is the top Linux strategist of M$. He is supposed to understand well open-source software and he uses his skills to help his company fight. His techniques are ultra-competitive; reminds of one of SC0's executives who would cut his wrist if he knew that the sight of blood would make you faint (in order to beat you). The article is "Microsoft sings a new tune on Linux", http://www.msnbc.msn.com/id/5614334/ in your browser.

Read the interview of Eugenia Loli-Queru, from the OSNews fame.

1 May 2004 (updated 1 May 2004 at 10:25 UTC) »

During this week there is a conference in the US called XDevConf and it was about developments of X and subsequently Linux on the desktop. Frederico Mena-Quintero kept notes and made them available on his blog, at [first], [second] and [third]. This year (2004) is considered the year of the desktop for Linux (and of the Monkey for many as well).

Reactivated my advogato account.

Actually I lost the password and I then realised (thanks Google.com) that there is no formal method to retrieve it or erase it. If you forget it, you are stuck!

I am putting simos74 in disuse. Yes, there is no way to remove an account either.

First post to my diary, 10th June, 2003.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!