Older blog entries for caolan (starting at number 195)

8000 commits

ohloh reckons that this week the count of commits in LibreOffice belonging to me hit 8000 accumulated over last approx 12 years.

I thought I’d sample each per-thousand rollover to see what they were…

Commit 8000: A minor startup time improvement and code simplication
Commit 7000: fix dnd crash. Generic bug fixing of fdo#39950
Commit 6000: callcatcher: remove unused code. Removing a few hunks of code that get compiled into the product, but that nothing calls. Some of the callcatcher foo which we use to trim the fat off LibreOffice
Commit 5000: Generic bug-fixing from grovelling over abrt traces, rhbz#710004 band-aid for immediate crash in IsAlignPossible.
Commit 4000: Workaround a weird-ass warning. from a minor compiler bug gcc#47679.
Commit 3000: Fix BSD uno bridges. We merged the various uno bridges together for the various unix platforms that use gcc to reduce the burden of maintaining so many. So needed to add the little register return quirk of the BSD platforms.
Commit 2000: Silence the (then) new gcc 4.5 warnings in our code
Commit 1000: Documented FSPA anchor values should override escher attributes when different. Efforts to get object positioning right on .doc import
Commit 1: MSOffice Controls {Im|Ex}port. Apparently my first post-StarOffice commit. Getting those “OCX” controls imported from MSOffice file formats.

Does this mean I’m an awesomely productive coder versus everybody else ? Nah, not really. For one thing, we started off with CVS went through mercurial and end up with git, and there’s generally a lot of difference in how many commits you generate with commit-unfriendly CVS vs git which makes you commit gung ho.

And there’s differences in commit style from one person to another too of course. I tend to generate a lot of commits because I like to refactor and code in “see my train of thought so you (ok me when I have to revisit it) can see where I went wrong if I do” steps rather than dump in a single commit that affects a hundred interconnected things. But it’s all the same amount of code at the end of the day.

Another wrinkle is that various development rules ended up hiding the true ownership of a lot of older commits. e.g.
a) Per day-0-release commits were all flattened of course, I only worked on StarOffice a short while before that event, so that’s a fairly small amount for me. But presumably a truly frightening number of commits for e.g. jp
b) for a while we worked by commiting only to cvs branches which release engineering would merge into master, e.g. this commit is an example, which is why the Hamburg release engineers hold unbeatable commit rates :-)
c) And later the burden of commiting to OpenOffice.org for non Sun staff became almost impossible to bear, e.g. provide install sets for Windows and Linux, get a QA volunteer to QA the install set for you and sign off on it. Which was all pretty hard to do given the speed of the one or two windows buildbots available for the purpose and the limited number of QA people. Much easier to just dump the patch into bugzilla and see if someone inside the bunker could take care of all that for you, e.g. commits like this

Syndicated 2012-05-20 20:12:33 from Caolan McNamara

shiny langtag library

liblangtag looks very nice. I wonder if there’s anything in my abandonware localehelper that might be useful to stuff in there. Maybe some of the locale to langtag mapping stuff.

Syndicated 2012-03-13 13:15:34 from Caolan McNamara

libreoffice help ported to clucene

From the things that make me happy department. Years ago our help documentation source was parsed with a bunch of java tools. At the time gcj was the only possibility for us in RHEL/Fedora and the build time for all localized langpacks that we included was about 26 hours in our build system.

Which was a bit depressing.

So I rewrote it in c++, taking super care to keep the same JavaHelp-derived format and so forth. Which brought build times down to about 10 hours.

Which made me happy.

At some stage though, then it was decided to then index our help with lucene, which brought back java as a build-time and run-time dependency for building help and searching it at run-time.

Which made me sad again, though openjdk was the default for us at this stage, so it wasn’t as much of a pain, though that’s why you have that perceptual lag when you first search for a term in help.

But now, for LibreOffice 3.6, Gert van Valkenhoef has ported our lucene code to clucene. helpcontent builds faster, and there’s no lag on searching for something in help.

Which made me happy.

Distro’s that want to use –with-system-clucene will need to build and install clucene’s contribs-lib

Syndicated 2012-03-08 12:21:41 from Caolan McNamara

cross-compiling LibreOffice for windows (mingw32) under Fedora

Dave Tardon’s new howto cross-compile LibreOffice under Fedora to target mingw32 under Fedora, http://dtardon.fedorapeople.org/mingw/

Syndicated 2012-03-06 11:55:37 from Caolan McNamara

syncfonts is handy

When debugging font related stuff its typical that the problem can only be triggered by a specific set of fonts. Here’s a rough-and-ready syncfonts script which when given the output of fc-list -v will try and install the fonts that are missing and remove the extraenous ones via yum, which works for the common case

Syndicated 2012-02-29 14:49:49 from Caolan McNamara

fakemail is handy

For debugging mail problem, e.g. when debugging some emailmerge stuff in LibreOffice recently, fakemail was really really handy when you have a bug which requires generating a couple of hundred emails in quick succession to trigger.

Syndicated 2011-11-21 13:00:05 from Caolan McNamara

libexttextcat 3.2.0

Released libexttextcat 3.2.0 (Extended Text Categorization used to guess the language that input text is written in). It can be found in this download dir. No code changes from 3.1.1, but adds a large collection of extra language signatures to nearly add the same language support to libexttextcat as LibreOffice supports, modulo languages that LibreOffice supports which don’t have a convenient UDHR translation to use as a basis to generate a language fingerprint.

Syndicated 2011-11-13 22:41:59 from Caolan McNamara

CTL/CTL format character previews

As Lior Kaplan demonstrated at LibreOffice 2011 Paris, our format character preview really sucks for CTL and CJK users. If no CTL/CJK text is selected then no CTL sample text is shown, and the CJK sample text is from the fontname itself. Many font names are just Latin text, so give no indication what they look like in the actual script/language that is being written in.

e.g. Old dialog for CTL, will only preview some Western text if no text is selected, no attempt to show any sample CTL text, or even the CTL fontname. For CJK it will additional show the fontname of the CJK font in the preview, which isn’t helpful if the CJK fontname contains no CJK glyphs.

Simply adding the CTL fontname wouldn’t help much, seeing as the fontname is David CLM. So, currently reusing the preview text used in the font-dropdown first stab at “doing the right thing” gives me…

Code for all this is mostly in svtools/source/misc/sampletext.cxx where there is now some hugely over-engineered set of heuristics to guess the best script a font is tuned for and various functions to generate suitable text when all we have is the font, versus the font+language vs just the language and if we want a short identifier to classify what script a font might be good to render vs a longer sequence of sample text for a font preview.

Probably best to drop rendering the fontname in the Western case for the text preview and use some sample text there too, at least for the mixed Western+CTL+CJK case as its confusing to have a font name rendered and some sample text in another font.

Syndicated 2011-10-21 10:59:53 from Caolan McNamara

PhagsPa and Tai Le, sample text ?

Looking through my fonts that are clearly tuned for a single specific script, there remain two scripts that niggle me as I don’t have suitable sample text for them. i.e. PhagsPa and Tai Le. I’m looking for a short snippet of sample text in those scripts which is suitable to stick into the font drop down preview. Ideally something fairly equivalent to “Alphabet”, “Script”, “PhagsPa/Tai Le” or “Tibetan/Tai Lü”.

Syndicated 2011-10-19 22:29:50 from Caolan McNamara

libexttextcat: text guessing feature

LibreOffice inherited a text language guesser, based on textcat from wise-guys.nl and extended by Jocelyn Merand to basically handle UTF-8 text. This is the thing that makes the suggestions as to what language your text might really be in when you right click on some misspelled text and chose set language.

We’ve now spun this off as a standalone libexttextcat and fixed up some conversion problems from the original selection of 8bit encodings and generated new language fingerprints in other cases, which should give better results for various languages, and allow us to enable checking for some languages which was disabled until now.

The current list of languages it attempts to detect can be seen here

Here’s a plausible process to add your favourite language to it, given git clone git://anongit.freedesktop.org/libreoffice/libexttextcat and bootstrapping from the insanely-translated UDHR using Abkhaz as an example.

cd libexttextcat/langclass/ShortTexts/
wget http://unicode.org/udhr/d/udhr_abk.txt
#skip english header, name result using BCP-47
tail -n+7 udhr_abk.txt > ab.txt
cd ../LM
../../src/createfp < ../ShortTexts/ab.txt > ab.lm
echo ab.lm ab--utf8 >> ../fpdb.conf

Then update the check target in src/Makefile.am to confirm the detection of ShortTexts/ab.txt as ab works using make check

I’ll remove the necessity of a configuration file in a later version, and convert the result to a BCP-47 tag. For the moment it remains a drop in replacement for the original solution which necessitates retaining the slightly odd language tag syntax.

Syndicated 2011-09-28 14:10:11 from Caolan McNamara

186 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!