Internationalization guidelines (request for comments)

Posted 29 Feb 2000 at 17:52 UTC by monniaux Share This

GNU and Gnome software is getting better and better at internationalization (i18n) and localization (l10n). Yet there are still thorny issues to sort out.

Nowadays, more often than not, free software tends to try to speak foreign languages, reflecting the fact that the free software development community is spread across dozens of countries around the world. For instance, if I have set the environment variable LANG to fr_FR on a RedHat version 6 machine, I get the following behavior:

[monniaux@quatramaran monniaux]$ ls --help
Usage: ls [OPTION]... [FICHIER]...
Afficher les informations au sujet des FICHIERS (du répertoire
courant par défaut). Trier les entrées alphabétiquement si aucune
des options -cftuSUX ou --sort n'est utilisée.

-a, --all afficher les noms cachés débutant par .

Most users in non English-speaking countries will be delighted to see computers at least trying to speak their mother tongue. Some people, particularly in the United States, tend to believe that most people around the world speak English, at least as a foreign language. This is a myth. The reality is rather the following:

  • In developing countries, people having to interact with foreigners, especially at tourist resorts, speak just enough for the interaction (for instance, they might know the few words necessary to sell you food).
  • In developped countries, people learn English at school and tend to forget it afterwards (do you remember all the things you were taught in school?). Of course, there are disparities: for instance, northern europeans, with languages spoken by only a few million people worldwide, tend to speak English better than the Italians.
In both cases, only a minority of the population is fluent enough in English to deal with it in lengthy documentations, especially if lots of technical words are used (these tend not to be found in standard bilingual dictionaries).

I shall not deal here with the technical issues involved, like the use of Gettext. I will rather try to focus on simple acts that can make internationalization and localization better. Internationalization, or i18n, is the fact of adapting a program for l10n localization. I18n isolates all the country or language-dependent parts of programs, so that l10n can adapt these for particular countries or languages.

I see quite a few thorny issues, on which I am going to give my opinion:

  • What not to translate: often, overzealous i18n'ers list for translation some internal error messages that are completely ununderstandable for the end-user, whether or not he is a native English speaker. Internal error messages should be marked as such, even in the English version. For translation, only the "Internal Error" headline should be translated; the error message itself should not. Why? A translation would be nothing to the end-user, and hamper efforts to find the bug (are you prepared to receive a bug report featuring the russian translation of "not enough linked lists in heap"?).

  • There are often many more country-specific traits in a program that one would think at first sight. Let me quote a few:
    • postal addresses and phone numbers are often formatted somewhat differently between countries. I have seen all to many Web sites refusing French postal addresses for want of a "state" field, even though France does not use states for postal addressing, and even though those sites pretended to address international customers. Furthermore, French postal codes go before the name of the city, not after. For instance, a French mail address might look like:

      Martineaud SA
      M. Henri Martin
      13, rue du Moulin Vert
      75361 PARIS cedex 11
      Now imagine that that person gets his own address refused by the program because he did not specify a state!

      I do not advocate putting a database of postal address and phone number formats in programs. Instead, programs or WWW sites should accept free-form addresses. Let us stop second-guessing users and assume that they can at least write addresses correctly.

    • Another country-specific trait is the use of pictograms or jokes. Pictograms often refer to cultural traits (like gestures meaning "it's ok"; the gesture meaning "it's ok" for Americans can mean "it's a zero, absolute crap" to the French) or to road signs (which are different between Europe and the US). Puns cannot be translated easily. Furthermore, jokes often make references to events and people totally unknown outside the country of the author: would an English-speaking Canadian understand that "eat apples" is a reference to a joke about the French presidential campaign in 1995? All the same, references such as "beam-me up!" are meaningful only to those who have watched an English version of Star-Trek.

    • Locales have tried to deal with the currency unit. Even more annoying are the length units or paper formats. Inches and the US letter paper format are unknown to many people around the globe, who use centimeters and A4. Not only should programs be able to accomodate both standards, but this should be a customizable item. Even better, the default value would be locale-dependent.

  • Other issues rather have to do with language itself:
    • All too often, mysterious vocabulary, coined-up terms and the like prevent translators from working efficiently. They also often native speakers from understanding what is being dealt with. If specialized terms and coined-up words are necessary, they should be explained in the documentation.

      Translators should try to find the commonest translation of the word, possibly looking at the major commercial products in the same field. When the foreign version of the word is more common, they should stick to it, no matter the official version.

    • Translators should really be careful not to translate text into gibberish. The GNU libc French locale contained (or maybe still contains) some ridiculously translated strings. I think that, when unsure, one should abstain from translating. Issues are best left to those mastering the target language and having a good command of the source language.

I would be very much obliged for some constructive comments, because I am quite sure I have overlooked many other thorny aspects. Notably, the use of Unicode and composite characters looks like a must for future developments. I would like people speaking Japanese, Korean or other languages with large scripts to give their insights on this subject.

Second guessing users on their addresses, posted 29 Feb 2000 at 20:55 UTC by Uruk » (Apprentice)

I agree with you on the point that including databases of postal codes in programs is bogus - I'd just like to point out though that for the most part, the reason address fields are usually so chopped up and input is required in all fields is because of the way the info is stored. It's not that the programmer doesn't trust the user, just that it's much easier to deal with the address internally if you break it up into a bunch of pieces and store it that way ala a relational database rather than putting it into one "block" of info and then going through the added pain of extracting pieces.

I think most of the negative aspects of the way things like that are done are due to the limitations in the program put there (not necessarily on purpose) by the programmer - not that the programmer doesn't trust the user.

Hmm, aren't you somewhat generalizing a French problem here?, posted 1 Mar 2000 at 08:46 UTC by Radagast » (Journeyer)

As a general followup both to the Advogato's number article which touched on this subject, and to this, I think you're trying to make this problem much bigger than it is. And the main reason is, I belive, that you're French.

Basically, France has a foreign language problem. There is no other country in western Europe where comprehension of English is so poor, and resistance to learning foreign languages is so strong. There are doubtlessly historical, political and cultural reasons for this, but the fact remains: French programmers are handicapped in the international community, because they're unable to write and comment code in the language the rest of the community uses. I'm not a native English speaker, I'm Norwegian, but both in Norway, and here in Mexico (a country notorious for how small a percentage of the population is fluent in English) programmers tend to know enough English to write and comment code "properly", and also comprehend English documentation and tools. However, a French programmer living in Mexico whom we were working with, was unable to do this.

Similarly, my mother, who is 50 years old, manages to use English language Win98 on her home computer, and most office workers in Norway run at least a few English language programs on their computers, seemingly without problems.

So I must repeat my earlier opinion: Making free software understand and produce international scripts, through projects like Pango is much more important than actual translation efforts. As for the French, I hope and trust they will at some point see that their xenophobia is becoming a liability, and change the attitude. I'm afraid it'll take time, though, and until then, French programmers and French software industry will be at a disadvantage.

Cultural and language problems..., posted 1 Mar 2000 at 09:23 UTC by caolan » (Master)

I do agree that there is nothing more annoying that input forms that have no idea of differences from the writes own cultural background. In Ireland we just do not have postal codes outside of Dublin, and there the numbers range from 1 to 16! (or there abouts). Again and again you hit the problem especially on the web where the form will reject the address because the concept of not having a postal code breaks the logic, and of course the state issue is another nonsensical problem

Another thing about free software and localization is a thought I had some time back reading about Iceland, where the population is considered too small for many commercial companies to provide translations of their software into the native language which they are trying hard to maintain. It strikes me that free software makes it very easy for speakers of smaller and more ignored languages to do their own localization of software. Bringing native language versions of software to speakers of tamil and thai to name two that have caught my attention. (aside: I have my doubts about the common belief that the web and greater connectivity in general will finally wipe out all minor smaller languages). But there is the other issue mentioned in the article about the problems of translating when an appropiate word or phrase is missing. There appears to be two reasons that this might happen

  1. You just don't know the words.
    Basically the language into which you translate really must be your own native language, If it isn't there is just going to be trouble. For myself reading from translations to english I find that even if the non native englishspeaking translator's english is very good there are just small jarring things, fine for tech documentation, but I think that for areas such as users guides it is problematic to say the least.
  2. The language does not have matching concepts at all!, this is a real problem.
    There have been some translation attempts made to translate software to Irish. But very few of us speak it natively anymore, finding native speakers with enough interest in free software to do it, (and the necessary technical skills to know what the english means), is difficult. The most serious problem though is that the voculary has more or less frozen in a pre industrial state. So "log in", "software", "program" etc are translated to very clumsy words. But then again maybe every new work appears clumsy for a while until it catches on. We'll see I suppose.

English and Arabic, posted 1 Mar 2000 at 09:49 UTC by rakholh » (Journeyer)

I strongly agree. I often have rejected addresses because i need to input a 'state' or i have to insert a 'zip code' (mail gets delivered in Egypt WITHOUT zip codes - this is related to illiteracy by the way).

Recently - I also encountered another problem. They expect '10-digit' phone numbers (intl code + number). THis is a problem cos different countries have different numbers (some (small) countries have 6 numbers + 3 intl. code) - My mobile has 2-digit country code 2-digit city code and then 7 numbers - total of 11. I noticed this on the TWA sight by the way.

There also two types of computer users (generally). There are the 'developers' and 'users' (actually - I could classify computer users into much more specific types and I think I could write an essay/article about that - but lets assume that everything falls under these two broad categories). Developers need to know English. No doubt about that. They might need to know English /very/ well to be good programmers too. All computer languages are in English or are based on English. A lot of computer terms also stem from English and everything else is a translation (buffer, core, RAM (Random Access Memory), etc.)

For the user they don't care what the developer does. They want a LOCALIZED version. This is a two-step process. First i18n, then l10n (by different people/groups usually). This is VERY important in certain areas. I firmly believe that Linux has not gotten wide-spread popularity in the Middle East because of this. Companies/Governments/Shops/Buisnesses - They all want ARABIC stuff. The official language is ARABIC so all their legal stuff is in Arabic. A /HUGE/ selling point of Linux and other OpenSource stuff is that its /FREE/ - you can download all programs off the net. They love this here - but then you have to tell them "You can't have arabic" - so they stick with 'Arabic Windows'. The fundamental problem with i18n/l10n is DIFFERENT SCRIPTS. The English Alphabet is used in A LOT of languages (i.e. even if you don't have those little caret's on top of the chars you can still generally understand french). But you can't write Arabic in English. Actually - the IRC world has been pioneers in this department - they have invented ways of writing arabic in english :) for example they use english equivelants of letters (to make the sound) and use numbers to represent the 'missing' letts - The two most popular ones are '3' and '7' (since they kind of 'look' like arabic characters). Obviously though this is not a REAL solution (i.e. you can't have your official buisness or government documentations in that - and you need to know english (at least the alphabet) to understand it).

I believe Pango is the /BIGGEST/ step in this department. Being able to NATIVELY show different scripts is a big plus. I know you can do CJK right now in Xwindows (is it possible on the console?) - You can't do Arabic in Xwindows (there are a few very limited applications that can (AraMosaic (arabic web browser) - and a propriety X application (free beer - its a word processor). They also sell a 'toolkit' of some sort so you can write arabic applications (I think - the word processor is statically linked to the toolkit, but you (devleoper and user) need to buy the toolkit for other applications). Anyway - Arabic also has an 'arabized' console called 'acon' - you run it - this software lies in between your terminal and your screen. It recognized certain (arabic) codes on your screen and renders them appropriately. SO i think you can mix&match arabic/english on the console thanks to acon. I also believe its free. Arabic is sorely missing a 'free' (libre) renderer in X.

The awnser to my comments is: PANGO :)

making the English version clearer, posted 1 Mar 2000 at 11:19 UTC by monniaux » (Journeyer)

"As for the French, I hope and trust they will at some point see that their xenophobia is becoming a liability, and change the attitude."

Ever been in the metro in Paris ? You would then notice that most signs are subtitled in two languages, including English. Even the conservative senate has a Web site available in several languages. I do not therefore think there is some kind of concerted effort to ward off English. The problem is perhaps on the way foreign languages are taught; pretending to have magic recipes to solve such complex educational problems is pretentious at best.

My experience is that most people with a reasonable level of education (my mother, for instance) can deal with application menus in English, as long as the vocabulary does not become obscure and/or very specialized. That is why I recommend that even in the English version, obscure terminology should be kept to a minimum. If for the sake of brevity or precision some obscure terminology should be used, its meaning should be recalled by an easily accessible online help.

However, even people that can cope with an application in English do not necessarily want to read an entire documentation in English. Declaring that free software should be available only to those that master English is akin to declaring that it should be reserved for those Unix-savvy enough to know how to set up custom initialization scripts. Proprietary software vendors have for long understood that users want software that does its task without requiring too much learning. That includes not requiring to know too much about the internals of the system and not requiring to learn foreign languages.

I agree that translation is not the only answer. Making the English version clearer could be very valuable too. All too often, programs and documentation contain abbreviations, jokes and allusions that are not essential to the use of the system; such "clever" language greatly hampers comprehension. Please note that having learnt a foreign language at school does not imply having seen 15 years of TV in that language.

For further reading, Jakob Nielsen has written some interesting paper on International-oriented WWW sites and International useability testing.

My reflection on the matter, posted 1 Mar 2000 at 12:31 UTC by ajkroll » (Journeyer)

English is the language of business.

Programming is very much a business.

Internationalization is a neat idea, works well if your mother language has words that can be exchanged and not lose meaning, especially where syntax is concerned.

My thoughts on developing? Screw it, you best learn English if your going to do programming on a computer. You can better communicate ideas in English to most of us who are developers anyhow. Not to go without mentioning that 99.9% of the programming languages are based on English words and syntax.

As far as French language in particular goes, it might be really great for law because it is very concise and clear. It has no place in something that I personally reguard as abstarct art. The art of programming is very abstract. It almost requires an abstract language and should continue to do so. For example, If it was all concise, it would stagnate very quickly. Abstract thought is what fuels new ideas and concepts. Sometimes a little confusion is good because it forces others to spawn entirely new projects.

Aribic is a mystery to me, so i have no thoughts in reguard to that, but hey, English does barrow your symbols and concepts for number representation... :-)

Enuff jabbering, I have a very small footprint, highly portable TCP/IP stack to complete.


"normal users" should use their native language, programmers should use English, posted 1 Mar 2000 at 13:45 UTC by Raphael » (Master)

I see that some of the replies posted here tend to confuse two categories of users: the average user (think about your mother) and the programmers.

I think that it is obvious that the programmers must know English. Furthermore, I firmly believe that a (very) good knowledge of English is a prerequisite for becoming a good programmer. See my comment (#9) attached to this previous story for more details about this. In summary, any student in computer science can learn the basics without knowing English, but it is not possible to go very far like that, because many APIs are written in English, as well as many of the best reference books and most of Open Source code, which is probably the best way to learn new tricks.

But things are different for the users who do not intend to write any code and who use the computer as a simple tool to get their job done or to have some fun (usually without knowing what is really inside the beast). As others have pointed out, most of these users have a very limited knowledge of English, if they know it at all.

I live in Belgium, a country that has three official languages: Flemish (Dutch), French, and a bit of German (again, see my previous comment for more background info about me). Many young children start be learning one of the other official languages of the country before learning English. Some of them learn English afterwards and manage to get quite good at it, but others don't. So, contrary to what some people might think, it is not true that almost all kids nowadays learn English at school. Why should they anyway? If they intend to get a job as an accountant, social worker, or anything that involves contacts with the local population but not with foreigners, it is more important to know the languages spoken locally than knowing English (except when you go on vacation, but then knowing Spanish or Portuguese might be equally important). In many cases, this is actually required: the majority of administrative jobs in Belgium require the knowledge of Flemish and French only. This is true for countries that have more than one official language (like Belgium or Switzerland) and for the countries that have strong local dialects that are sufficiently different from the official language.

If English is your third or fourth language, chances are that you cannot understand it very well. And nobody should blame your for that.

Even for those who understand English reasonably well, this often involves some extra effort: a novice user has to remember that the "Save" option under the "File" menu is the thing that makes sure that her work is not lost. People who are not so familiar with English must in addition remember what these words mean. I took a simple example, but think about the meaning of options such as "merge visible layers" or "round rectangular selection" in the GIMP. Their meaning should be obvious for a native English speaker, but not for someone who has to translate word for word. Even if they know the individual words, this implies an extra memorization effort that makes the tool harder to use.

Besides, if everything I wrote above was complete nonsense, why would all commercial software companies invest so much in translating their programs in as many languages as possible?

Good Article, posted 1 Mar 2000 at 19:19 UTC by idcmp » (Journeyer)

A very well written, easy to read, and to the point article. Having applications show up in your native tongue is a very important issue, and I think those of us who speak English and are used to using English apps don't realize what we take forgranted.

Boot up your system with LANG=fr for a day and realize that anything you can read in English, whomever is really using that tool cannot.

Tried using foreign language tools, posted 1 Mar 2000 at 20:03 UTC by DarkBlack » (Apprentice)

I've tried turning on a different language, namely fr_FR, to see how well the tools that I use on a regular basis are translated. What I noticed is that in many GNOME applications, the text to translate is not really a phrase or sentence in french. I know that the official french language does not have anywhere near the number of words that english has, but this is ridiculous.

Not to pick on Balsa (many other GNOME apps are just as bad), but many of the preferences options are quick phrases that are not easily translated to other languages. The same thing goes for the text and hints for toolbars, and menus. I think that application writers need to make sure that they are clear in what they are trying to communicate to the user.

Precisions on several issues, posted 2 Mar 2000 at 13:00 UTC by monniaux » (Journeyer)

First of all, reading some comments I fear I was not clear enough about the software parts I want internationalized and localized. I think that

  • text and behavior seen by end users should be i18n'ed & l10n'ed;
  • programming documentation may be translated (even though nearly all programmers can more or less read English, reading lengthy documentation in a foreign language can be tiresome);
  • source code, comments and language keywords should be in English.

From the feedback I've received on Advogato and by mail, it seems indeed that, apart from language issues, there are lots of annoying behaviors cause by software assuming that some country-specific or language-specific standards are universal:

  • Myself and others have mentioned the stupid behavior of software and WWW sites enforcing US-style addresses (requiring state & ZIP code for instance, but this is not the only issue) or US-style phone numbers.

  • Eric Moreau reports that French-language Windows sets the keyboard as AZERTY (standard in France) even though Quebec uses special QWERTY keyboards. All the same, French-language versions of some software assume that the paper size is ISO A4 (international standard, used by nearly everybody outside North America) while Quebec uses US Letter.

    The issue here is that locale-specific information is not only a question of language, but also of many "cultural" issues and local standards. The locale for French-speaking Canada is fr_CA, and differences such as the ones cited above have to be taken into account. There are similar issues between the United States (us_EN) and the United Kingdom (en_UK).

I think that defaults for paper size, measurement units (length, temper ature...) and the like should be set according to the locale. Be sure to consider the full locale, not only the language part. However, such settings should also be fully customizable. There are some peculiar situations (for instance, people from one country working temporarily in another country and using an operating system fitted for their language) where some settings must be taken from the locale and some other overriden. Imagine an American working in France: he wants his software in English, but he also wants A4 paper to fit the printer.

There have been some comments on Arabic. I'd be delighted to hear precisions about Arabic (apart from the fact that it's written right-to-left). Namely:

  • how is Arabic text entered on "mainstream" operating systems (Windows, Mac) - is it a simple keyboard scheme like AZERTY or does it need syllabe combination like Japanese canna?
  • how does software handle changes of direction in the middle of text.

English and Arabic - Take two :), posted 2 Mar 2000 at 14:55 UTC by rakholh » (Journeyer)

Arabic is relatively simple when it comes to keyboard stuff. We have 26 letters - you guys have 24. There is a slight problem though - our alphabet is in script format so it makes a big difference whether at the start/end/middle of a word. Actually - the software handles that, but there are a few exceptions (and thus those keys are on the keyboard). We also have something called "tashkeel" which are symbols that are put on letters to change the way they are prounounced (like the caret in France). Anyway - the keyboard's sold basically have the english letters and arabic letters on the keys (using the space for some symbols for letters (i.e. if u want the symbols go into English mode). What Microsoft does is have a little 'docklet' thingy (next to the time) that says 'En' you click on it (or right click) - and you can choose Arabic.

As for direction changes in the middle of text. Well - that is a surprisingly complex issue for such a simple idea :) There is a whole ruleset that does it. Furthermore each ruleset does it slightly different. There is a Unicode standard and I believe Microsoft does its own thing, Netscape does its own thing too I think (although Mozilla might be Unicode compliant). By the way, even though the Unicode thing is the theoretical "standard" I find that it does not make sense (but apparently it does to everyone else). KDE had screenshots a while back (at that shows Hebrew/English text directions.

Furthermore - Arabic has another problem :) Here we have an Islamic calendar (which is not really used to 'schedule' stuff) - It is lunar based - but it is /worse/ than the Hebrew calendar (lunar based). Hebrew calendar can be determined /algorithmically/ - Ours isn't - it is based on the MOON SIGHTINGS. i.e. If it is a very cloudy end of month or something the date can shift. There is no way you can determine the months for an Islamic calendar - There is always the possibility of error :) Thankfully though, they only "check" the moon sightings twice a year (doing religious festivals (Eid)). I'm gonna try and work on it after Eid this year (which falls on the exact same time as GUADEC by the way) so the calendar can be 'stable'. See calendar- archives for more info :)

Western European Focus, posted 3 Mar 2000 at 00:17 UTC by alan » (Master)

I think it speaks volumes that everyone is discussing western european or mid european languages. We have a lot to do beyond that. Different writing directions and fonts are an obvious beginning

Also calling something a 'french problem' neither makes it go away in France nor solves it. It isnt a problem for just France. Icelanders are very proud of their language. Use English isnt an acceptable answer. Its a deep cultural thing .

And once you get to Japan then English is a big barrier. Im happy that I have folks translating some of my articles into Japanese. There is a HUGE barrier between Japan and Europe/USA in free software. I only really got a feel for the scale of it when I got the PC110 japanese palmtop. You try finding and configuring a special X server when the notes are in a language you cannot read and the links aren't all obvious to follow.

Similarly we have a Linux/PC98 porting project. Anyone here heard of it - probably not -why because you need to speak Japanese to follow it

I would really love to have EN<->JP autotranslation tools even as bad as the babblefish.


RFC on natural language and programming language, posted 3 Mar 2000 at 08:11 UTC by graydon » (Master)

While I agree fully that writing internationalized software is important (indeed, it forces you to think about layering and abstraction issues you probably ought to be thinking about anyway), this thread has brought up a different interesting issue.

When we write code, we are not actually writing "in english". We are writing in some programming language, which has bits of english and bits of discrete math and type theory strewn through it. I have found it pleasantly surprising that in some cases a person I cannot communicate with about, say the weather, I can occasionally exchange code fragments with (albeit with some difficulty translating verbs and nouns -- at least you can look such things up in dictionaries). This phenomenon was even more pronounced when reading math texts in another language, where the mechanism of proof is so similar and the notation so precise and terse that the issue of not speaking the language of the narration seems like much less of a problem than when, say, one tries to read a newspaper.

Comparing this situation to the deep culture shock of being in a society where one cannot read or hear anything, it seems clear that the programming community has a bit of a leg up in getting to know one another cross-culturally since we can at very least get code ideas from one another.

I wonder in which cultures, natural languages and programming languages people have found the language barrier mitigated by the ability to express things in code. Code has a much simpler grammar and vocabulary, If you can translate the identifiers a person is using, it seems possible to even use the code as a makeshift basic communication system. Are certain languages better or worse for this? I recall reading a page which stated that FORTH was a very "korean friendly" language due to its postfix form, but that seems somewhat of a surface issue. Has anyone here ever programmed in a language written/designed by a culture you consider "foreign"? western europeans and americans invented most of the languages I use commonly, I think, but I don't know for certain.

Installation instructions (Japanese etc...), posted 3 Mar 2000 at 12:32 UTC by monniaux » (Journeyer)

I agree with Alan that Japanese is indeed difficult to handle:

  • complex input methods (phonetic input converted to multiple choice of ideograms),
  • multibyte character sets.

I would like to hear some details about these two points. I know the Emacs-MULE kanji and kana input system. I know that other applications must use software such as kinput2. How are such things handled in Gtk+?

This leads me to another issue: documentation. A Japanese-speaking friend asked me to install a Japanese-aware Linux. The problem was the same as Alan's: nearly all the available documentation for this is in Japanese! I don't speak Japanese, and my friend doesn't know about computer terms, even when their are English in Japanese pronunciation.

The lesson to be retained from this problem is that even though some software is meant for users speaking a certain language, documentation (at least for installation) should also be available in English. Two reasons:

  • the user is not necessarily the installer: people install software for friends; system administrators often have to deal with foreign guests or students learning foreign languages;
  • even if the user speaks the said foreign language, he or she does not necessarily know the necessary computer vocabulary.

Quality of translation, posted 3 Mar 2000 at 21:11 UTC by monniaux » (Journeyer)

Sorry to be so verbose these days.

The quality of some free software translations is not adequate. It seems that translations, including in software such as GNU libc, Gnome and KDE, suffers from the following problem:

  • poor spelling (some scheme to use ispell on .po files would be neat;

  • difficulties to parse English:
    • Some English phrases are ambiguous: for instance, "Load Info" can be understood as "Information about the load" or "Load the information"); GNU gettext provides for comments to be inserted by programmers to help translators, and this feature is little used; currently translators have to find a way to launch the dialog box or menu containing this text to get the context;

    • Some translators forget that "A B" in English means "B <preposition> A" (like "Window info" means "Information on the window") and not "A <preposition> B" (such badly translated constructs are often to be found in French translations).

  • poorly chosen translations:

    Even though a bilingual dictionary says a word can be translated to another, those words are not necessarily equal; for instance, some software (GNU info if I remember well) says "Fouille infructeuse" when search has failed. Fouille indeeds means search, but has a sense either of archeological burrowing or as police searching a suspect; the correct translation is recherche.

    Ridiculous translations such as the one quoted above can often be avoided by following simple rules:

    • do not translate into a language unless you speak it perfectly; leave it to native speakers;
    • do not just look words in the dictionary, but try to understand the context;
    • try to be homogeneous (the same concept should always be translated the same way);
    • use the commonly accepted terminology for the concept, possibly observing mainstream applications and operating systems.

Poor Translation, posted 5 Mar 2000 at 02:08 UTC by alan » (Master)

Being a native English speaker (well at least in most eyes, some people aren't sure that my Birmingham accent counts as anything but Caveman) I normally get to avoid translation problems

Reading SuSE manuals is something I recommend for people who want to get an idea what it must be like. These are good translations but you will find random quirks in them, and escapee bits of German or German screen shots.

Do misspellings in the original cause problems, too?, posted 5 Mar 2000 at 23:38 UTC by Telsa » (Master)

Like many native English speakers (in the UK, at least), I have a very meagre grasp of other languages. I have recently got into attempts to document things. I have learned some things: bad puns do not translate; using a word with two meanings is a mistake unless you clarify which meaning you intend; and simple grammar is good. (Yes, I know I'm bad at the last one.)

One thing I do wonder: how much does misspelling matter? I can follow some French and the occasional tiny piece of German. This usually involves heavy use of a dictionary. When people use contractions, I find it hard. But sometimes they use words which are not in my dictionary at all. I am never sure whether they're real words that my dictionary is too polite to mention. Or whether they're misspellings.

Is the same true for other languages? I imagine it must be. How much of a problem is that? I know I sometimes have trouble with some writing in English. But am I patronising other people (who tend to be a lot better at my language than I am at theirs), or is it important to check spelling first?

Personally speaking, I hate bad spelling and grammar from people who should know better, but that's not really the question. The question is how bad it makes it for people who aren't native speakers. Or whether it does at all?

resources, posted 7 Mar 2000 at 19:34 UTC by rillian » (Master)

It would be good if there was more documentation available on i18n issues. Tomohiro KUBOTA wrote an excellent desciption of Japanese issues, now part of the Debian documentation project. I'd suggest that anyone here who's familar with a particular locale/language contribute a section; so far there's only Japanese and Spanish. The currect document is available through the debian documentation project. I think it would be especially valuable to have sections on languages with non-roman scripts.

There's also a website,, that came out of the same effort, but it seems to be stalled at this point.

On an unrelated note, I can really relate to monniaux's complaint about documentation. I've had the same problem installing chinese support on our lab computers. The debian packages presumedly work, but I can't even tell if the help file is being properly displayed! Fortunately, these things are easy to solve through collaboration with a literate speaker.

As a point of comparison, I tried out the Arabic, Chinese and Japanese support on a friend's iMac last week. Much as it's been described for Win32, there's an extra menu that lets you select your input method (this applies to switching roman keyboard layouts as well). This didn't seem to switch the localization of any of the applications, it just let you enter text in a different language, and you can mix scripts freely. There were some input specific menus which were localized--and for some languages also available in English--things like character dictionaries and the ability to choose among various imput methods and (I think) character encodings. Generally I found it much easier to deal with than the options under Linux, but still far from ideal.

Moving even further afield, does anyone know about handwriting recognition? I've seen little 3x5 cm drawing tablets sold around here, apparently specifically for Chinese input. The character dictionary is of course much bigger, but I'd think the well-defined order of strokes would help alot with recognition. Can anyone confirm this? OTOH, a cursive script like Arabic is probably much harder than Roman, where we can at least print.

I18n, dos and don'ts, posted 17 Mar 2000 at 15:59 UTC by ingvar » (Master)

Being of a foreign persuasion (Swedish) and fully functional in at least Swedish and hopefully English, there's a few pitfalls and annoyances to think of when translating anything (especially from English to Swedish, that being what I am familiar with):

  • Dictionary translations Do Not Work. There are many words in English that have multiple correspondences in Swedish (and vice versa).
  • A prompt with well-defined "yes"/"no" semantics in English doens't necessarily have it in another language (no good examples, alas).
  • At times, a bad translation is worse than none.

I have a feeling that a good way of doing a software translation is to first make a quick-and-dirty dictionary atta^wtranslation, then hand it over to several people who are bi-lingual in the source and target languages.

And, to really make my life hard, I hereby volunteer to do what I can, if someone needs help in translating to Swedish.

Phone Numbers, posted 28 Mar 2000 at 04:04 UTC by dcs » (Master)

To those complaining about phone numbers... the 11 digits international phone number is an International Standard. If you expect to be able to make/receive international phone calls without problems, you ought to observe them. That's one reason why some countries which used ad-hoc schemes for celular phone numbers later changed them to conform to the 11 digits universal number.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

Share this page