Recent blog entries for simos

4 Dec 2004 (updated 9 Dec 2004 at 20:33 UTC) »

Here we see how to send fully localised e-mail from PHP. Similar steps can be taken with other scripting languages.

You need to configure PHP on the server to use the mbstring extension. In popular distributions it is already installed with PHP. Verify that along with the package "php" you also have "php-mbstring" installed as well.

Then, you configure the default language settings for mbstring, so that your scripts are simpler. As we opt for Unicode and utf-8, these settings are universal. The following section should already exist in your distribution (at least on my Fedora Core 2), but commented out.

/etc/php.ini

[mbstring]
; language for internal character representation.
; -> "Neutral" is for Unicode.
mbstring.language = Neutral
                                                                                                                                                                          
; internal/script encoding.
; Some encoding cannot work as internal encoding.
; (e.g. SJIS, BIG5, ISO-2022-*)
mbstring.internal_encoding = UTF-8
                                                                                                                                                                          
; http input encoding.
; -> Setup your Web pages to be in utf-8 and specify utf-8 in HTML headers.
; -> Sometimes Apache is configured to send the encoding in the HTTP headers.
; -> ..which is set to iso-8859-1 (Debian?). Don't omit that setting.
mbstring.http_input = UTF-8
                                                                                                                                                                          
; http output encoding. mb_output_handler must be
; registered as output buffer to function
mbstring.http_output = UTF-8
                                                                                                                                                                          
; enable automatic encoding translation accoding to
; mbstring.internal_encoding setting. Input chars are
; converted to internal encoding by setting this to On.
; Note: Do _not_ use automatic encoding translation for
;       portable libs/applications.
mbstring.encoding_translation = On
                                                                                                                                                                          
; automatic encoding detection order.
; auto means
mbstring.detect_order = auto
                                                                                                                                                                          
; substitute_character used when character cannot be converted
; one from another
; -> if character code not found, show as U+xxxx. Good for finding issues.
mbstring.substitute_character = long;

Finally, you can send localised e-mail through the following script. You can either execute it from command line (php mailtest.php) or add it as a Web page and visit it.

<?php
include('Mail.php');
include('Mail/mime.php');
 
$from = "From: \"" .  mb_encode_mimeheader('Αυτά είναι ελληνικά') .  "\" <root@localhost>";
$to = mb_encode_mimeheader('Παραλήπτης') . " <somemail@gmail.com>";
$subject = 'Θέμα';
$body = 'Περιεχόμενο του γράμματος';
 
mb_send_mail($to, $subject, $body, $from);
?>

The strings in Unicode (utf-8) I am using for the example should be Greek to you.

Docbook XML and creating pritable documents (like PDF).

Is that an interesting topic? Well, it sure is. I'll go in details, in layman terms, so it's approachable.

XML is a versatile markup language that you can use to represent almost any information. You typically enclose pieces of data in tags, such as with <name>Simos</name>. These tags are custom and signify what is that they contain. Therefore, XML is so versatile that you need to have a so-called "schema" or a description of the available tags for the type of document you want to represent.

There is a standardisation process of schemata (plural of schema) for different domains at xml.org and specifically at their registry page.

XML is used in open-source software in many places and the most common use is that of the documentation. Here, DocBook XML is used. For example, see The Linux Documentation Project (TLDP) which has standardised to DocBook XML (if you remember it used to be LinuxDoc some years back).

Suppose you have a document written in DocBook XML. With tools you can convert it to other presentation formats such as plaintext, HTML (+variants), PostScript, PDF and so on.

For the first two the process is quite easy as the tags are either stripped (plaintext) or converted to other tags (HTML). Your text editor or Web browser can be used to represent these, and they do a good job representing Unicode characters as well.

For PostScript or PDF the story is a bit different. It works relatively well with latin-based scripts. For example, see Docbook bits which shows how to setup your system with Fedora Core 2. No need for compilation, simply install the available RPM packages. For non-latin languages it's not so easy.

To convert from DocBook XML to PDF you need two programs; one that will take your DocBook XML source file and apply a stylesheet, producing a Format Objects (FO) intermediate file that contains both content and presentation information, and another that takes the FO file and converts to PDF.

The first program is an XSLT engine and the second an FO engine. There are several such engines for both programs, listed at XSL Engines. We mentioned Docbook bits above; it uses xsltproc to convert DocBook XML to FO and then passivetex to convert FO to PDF (or PostScript). Another combination is to use Xalan and FOP (example). A third option is xmlroff that can do both jobs; start from XML source and stylesheet and produce PDF. xmlroff is interesting because it uses Pango (yeah!) to render fonts (example with sample text in greek, russian, arabic and tamil).

To sum up, what the community would need is a way to create quality PDF and PostScript files from DocBook XML for any language (assuming there is a font), this process is easy to follow (like Docbook bits) and distributions have the necessary tools available as packages (RPM, DEB, etc).

10 Aug 2004 (updated 10 Aug 2004 at 23:14 UTC) »

Quite a few interesting articles today.

To start with, Linux and Patent Risks talks about the 283 patents that the Linux kernel might affect. The study comes from Dan Ravicher, a patent attorney that apart from his main job, he represents the Free Software Foundation pro-bono. The patent system can be abused very badly and Dan is working towards its reform. People should know how it works as ignorance pays. Read the article!

Reading daily http://www.Groklaw.net helps understand what's going on with the world's favourite litigation chu-chu train. Also, there are articles on other legal aspects and open-source software. Behind the amazing work of Groklaw is Pamela Jones, a journalist with background in paralegal matters.

Martin Taylor is the top Linux strategist of M$. He is supposed to understand well open-source software and he uses his skills to help his company fight. His techniques are ultra-competitive; reminds of one of SC0's executives who would cut his wrist if he knew that the sight of blood would make you faint (in order to beat you). The article is "Microsoft sings a new tune on Linux", http://www.msnbc.msn.com/id/5614334/ in your browser.

Read the interview of Eugenia Loli-Queru, from the OSNews fame.

1 May 2004 (updated 1 May 2004 at 10:25 UTC) »

During this week there is a conference in the US called XDevConf and it was about developments of X and subsequently Linux on the desktop. Frederico Mena-Quintero kept notes and made them available on his blog, at [first], [second] and [third]. This year (2004) is considered the year of the desktop for Linux (and of the Monkey for many as well).

Reactivated my advogato account.

Actually I lost the password and I then realised (thanks Google.com) that there is no formal method to retrieve it or erase it. If you forget it, you are stuck!

I am putting simos74 in disuse. Yes, there is no way to remove an account either.

First post to my diary, 10th June, 2003.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!