caolan is currently certified at Master level.

Name: Caolan McNamara
Member since: 2000-02-07 09:11:47
Last Login: 2007-05-23 17:18:38

FOAF RDF Share This

Homepage: http://www.skynet.ie/~caolan

Notes:

I sometimes write stuff

Projects

Articles Posted by caolan

Recent blog entries by caolan

Syndication: RSS 2.0

fakemail is handy

For debugging mail problem, e.g. when debugging some emailmerge stuff in LibreOffice recently, fakemail was really really handy when you have a bug which requires generating a couple of hundred emails in quick succession to trigger.

Syndicated 2011-11-21 13:00:05 from Caolan McNamara

libexttextcat 3.2.0

Released libexttextcat 3.2.0 (Extended Text Categorization used to guess the language that input text is written in). It can be found in this download dir. No code changes from 3.1.1, but adds a large collection of extra language signatures to nearly add the same language support to libexttextcat as LibreOffice supports, modulo languages that LibreOffice supports which don’t have a convenient UDHR translation to use as a basis to generate a language fingerprint.

Syndicated 2011-11-13 22:41:59 from Caolan McNamara

CTL/CTL format character previews

As Lior Kaplan demonstrated at LibreOffice 2011 Paris, our format character preview really sucks for CTL and CJK users. If no CTL/CJK text is selected then no CTL sample text is shown, and the CJK sample text is from the fontname itself. Many font names are just Latin text, so give no indication what they look like in the actual script/language that is being written in.

e.g. Old dialog for CTL, will only preview some Western text if no text is selected, no attempt to show any sample CTL text, or even the CTL fontname. For CJK it will additional show the fontname of the CJK font in the preview, which isn’t helpful if the CJK fontname contains no CJK glyphs.

Simply adding the CTL fontname wouldn’t help much, seeing as the fontname is David CLM. So, currently reusing the preview text used in the font-dropdown first stab at “doing the right thing” gives me…

Code for all this is mostly in svtools/source/misc/sampletext.cxx where there is now some hugely over-engineered set of heuristics to guess the best script a font is tuned for and various functions to generate suitable text when all we have is the font, versus the font+language vs just the language and if we want a short identifier to classify what script a font might be good to render vs a longer sequence of sample text for a font preview.

Probably best to drop rendering the fontname in the Western case for the text preview and use some sample text there too, at least for the mixed Western+CTL+CJK case as its confusing to have a font name rendered and some sample text in another font.

Syndicated 2011-10-21 10:59:53 from Caolan McNamara

PhagsPa and Tai Le, sample text ?

Looking through my fonts that are clearly tuned for a single specific script, there remain two scripts that niggle me as I don’t have suitable sample text for them. i.e. PhagsPa and Tai Le. I’m looking for a short snippet of sample text in those scripts which is suitable to stick into the font drop down preview. Ideally something fairly equivalent to “Alphabet”, “Script”, “PhagsPa/Tai Le” or “Tibetan/Tai Lü”.

Syndicated 2011-10-19 22:29:50 from Caolan McNamara

libexttextcat: text guessing feature

LibreOffice inherited a text language guesser, based on textcat from wise-guys.nl and extended by Jocelyn Merand to basically handle UTF-8 text. This is the thing that makes the suggestions as to what language your text might really be in when you right click on some misspelled text and chose set language.

We’ve now spun this off as a standalone libexttextcat and fixed up some conversion problems from the original selection of 8bit encodings and generated new language fingerprints in other cases, which should give better results for various languages, and allow us to enable checking for some languages which was disabled until now.

The current list of languages it attempts to detect can be seen here

Here’s a plausible process to add your favourite language to it, given git clone git://anongit.freedesktop.org/libreoffice/libexttextcat and bootstrapping from the insanely-translated UDHR using Abkhaz as an example.


cd libexttextcat/langclass/ShortTexts/
wget http://unicode.org/udhr/d/udhr_abk.txt
#skip english header, name result using BCP-47
tail -n+7 udhr_abk.txt > ab.txt
cd ../LM
../../src/createfp < ../ShortTexts/ab.txt > ab.lm
echo ab.lm ab--utf8 >> ../fpdb.conf

Then update the check target in src/Makefile.am to confirm the detection of ShortTexts/ab.txt as ab works using make check

I’ll remove the necessity of a configuration file in a later version, and convert the result to a BCP-47 tag. For the moment it remains a drop in replacement for the original solution which necessitates retaining the slightly odd language tag syntax.

Syndicated 2011-09-28 14:10:11 from Caolan McNamara

186 older entries...

 

caolan certified others as follows:

  • caolan certified hp as Master
  • caolan certified raph as Master
  • caolan certified alan as Master
  • caolan certified lewing as Master
  • caolan certified miguel as Master
  • caolan certified jmason as Journeyer
  • caolan certified jwz as Journeyer
  • caolan certified joey as Journeyer
  • caolan certified jab as Journeyer
  • caolan certified sterwill as Journeyer
  • caolan certified cuenca as Journeyer
  • caolan certified shaver as Master
  • caolan certified MJ as Journeyer
  • caolan certified slogan as Journeyer
  • caolan certified alecm as Master
  • caolan certified aoliva as Master
  • caolan certified btenison as Journeyer
  • caolan certified hpa as Master
  • caolan certified Marcus as Master
  • caolan certified valen as Apprentice
  • caolan certified samth as Journeyer
  • caolan certified erAck as Journeyer
  • caolan certified Malkuse as Apprentice
  • caolan certified martinicus as Journeyer
  • caolan certified sander as Journeyer
  • caolan certified cinamod as Master
  • caolan certified hub as Journeyer
  • caolan certified wlach as Master

Others have certified caolan as follows:

  • bombadil certified caolan as Journeyer
  • mjs certified caolan as Journeyer
  • alan certified caolan as Journeyer
  • jmason certified caolan as Journeyer
  • duncan certified caolan as Master
  • jab certified caolan as Journeyer
  • mblevin certified caolan as Journeyer
  • Jody certified caolan as Master
  • andrei certified caolan as Journeyer
  • bernhard certified caolan as Journeyer
  • btenison certified caolan as Master
  • billf certified caolan as Journeyer
  • camber certified caolan as Journeyer
  • jrennie certified caolan as Journeyer
  • nils certified caolan as Journeyer
  • claudio certified caolan as Journeyer
  • cenobyte certified caolan as Journeyer
  • valen certified caolan as Journeyer
  • cuenca certified caolan as Journeyer
  • samth certified caolan as Journeyer
  • ole certified caolan as Journeyer
  • jules certified caolan as Journeyer
  • thomasq certified caolan as Master
  • tja certified caolan as Journeyer
  • nixnut certified caolan as Journeyer
  • manu certified caolan as Journeyer
  • yakk certified caolan as Master
  • pixelbeat certified caolan as Journeyer
  • jelly certified caolan as Master
  • inri certified caolan as Journeyer
  • nny certified caolan as Journeyer
  • erAck certified caolan as Master
  • sander certified caolan as Journeyer
  • martinicus certified caolan as Master
  • juhtolv certified caolan as Master
  • cinamod certified caolan as Master
  • hub certified caolan as Master
  • ariya certified caolan as Master
  • AlanHorkan certified caolan as Master
  • wlach certified caolan as Master
  • lerdsuwa certified caolan as Master
  • kclayton certified caolan as Master
  • adl certified caolan as Master
  • janneke certified caolan as Journeyer
  • yosch certified caolan as Master

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page