Older blog entries for djcb (starting at number 157)

seek & destroy

In my last entry I wrote a bit about optimizing my little project. One other significant optimization I found was inode-sorting, from an idea I got from some old postings on the mutt mailing list.

The idea is as follows: some file systems, in particular ext3, support hashed b-trees to speed-up lookups in large directories (paper). That's nice for finding particular files. However, as a side-effect, when you scan full directories (as mu does when indexing), you might get the entries back in a rather chaotic order. If you then try to open the files in that order, you suffer from long seek times, and consequently, bad performance.

The solution is to sort the dir entries by their inode (in ascending order), and then open the corresponding files in that order. This is what mu (mu-index) does by default, starting with version 0.3. You can turn it off with --tune-sort-inodes=0, but there is usually little need for that, as the overhead of sorting is negligible.

So, what difference does it make? Answer: it depends on how the files are laid out; if you already get your files back in their 'natural order', there won't be much difference - this is what happens on my main machine. But, on another (old) machine where the files are not in that order, the improvements are substantial: I found that indexing 1500 message in 25 seconds without inode-sorting, goes down to 15 seconds with inode-sorting; a nice 40% improvement.

Note(1): this works for ext3 directories with dir_index enabled; there's a HOWTO. There are other file systems that have similar features, but I haven't tested those. Note(2): This optimization is not very useful for flash-based file systems, as they don't really care in what order you open files.

Syndicated 2008-10-22 18:31:00 (Updated 2008-10-24 14:46:47) from djcb

chasing time


As discussed before, I am working on a little hobby project called mu, for indexing/searching e-mail messages in maildirs. As a true hobby project, it's about finding things out. I'll take notes as I go along.

indexing

One important part of indexing and searching is.... indexing. Indexing (in this context) is the operation of recursively going through a maildir, analyzing each message file, and storing the results in a database. In mu's case, there are actually two databases, one SQLite-database and one Xapian-database (a really interesting tool - to be discussed later).

Indexing may take a considerable amount of time; mu version 0.1 took 192 seconds (on average) to index 10000 messages in my testing corpus. And this version did not even support the Xapian database. Indexing involves reading from disk, querying the database to see if the message is already there, and if not, storing the message metadata. Because of this scheme, re-indexing of the same 10000 messages only takes about 5 seconds (with re-indexing, only modified/new messages need to be indexed).

The full indexing operation probably does not happen very often, for most people. Still, I think it's very worthwhile to try and make it faster. Nobody likes to wait for 192 seconds, even once - and during development, I need to do a full index rather often. Another important reason is that optimizing software is simply interesting - which is a main motivator for a hobby project.

So, let's see how we can make this a bit faster; here I'll only discuss some of the database-related optimizations.

transactions

As mentioned, mu stores the indexing data in two databases; one SQLite-database and one Xapian-database. Both of these databases know the concept of a transaction. By default, SQLite puts every query in a separate transaction. This is very safe, but also quite expensive. When indexing messages, there is no risk of data loss, so it's quite reasonable to increase the transaction size. And this makes things a lot faster. Between mu version 0.1 to 0.2, I increased the default from one transaction per message (3 queries) to one transaction per 100 messages. This made indexing more than 2.5 times faster -- see the table below. This improvement is even more impressive when considering that I also added full-text search, indexing message bodies as well (this is what Xapian is for).

For Xapian transactions, the default value I chose is 1000 transactions -- but the performance effects are much smaller. So, my 'optimal' values, are 100 and 1000, respectively. I found that transactions bigger than that don't improve the performance very much, but of course still affect memory usage. You can tune these with --tune-sqlite-transaction-size and --tune-xapian-transaction-size. The defaults should be just fine for the normal desktop use case - still, if you need a less memory-hungry but slower version, that is possible too. See the mu-index(1) man page for details.

pragmatic

Another area for performance are SQLite's PRAGMA-statements. Some useful ones are PRAGMA synchronous= (which you can influence with --tune-synchronous and PRAGMA temp_store=, which you can tune with --tune-temp-store. Again, see the mu-index(1) man page for details.

It turns out that PRAGMA synchronous allows for some improvement. This setting determines whether SQLite does it writes in a synchronous way. It's faster (and slightly less safe, but the notes at the end of this blog entry). From the table below, it seems that PRAGMA temp_store does not make much difference in this case. This PRAGMA determines where we store temporary (non-committed) results. Some testing suggests this is because, when we do not enable synchronous writing (above), even the 'file' temp_store never physically hits the disk, due to caching by the kernel.

results

Having optimization options tunable through command line options is really useful. Software optimization, especially from what your read online, seems to be a field full of myths, outdated 'facts' and placebo-effects. And even if the information is correct, it may not apply to your use case. The only thing you can do is measure it. And with command line-options I can easily do that, as well as see how various combinations of optimizations perform.

Here's a table with the results for indexing 10000 messages with version 0.3. Between all the runs, I used

# sync && echo 3 > /proc/sys/vm/drop_caches
to flush the caches. That's a critical step - the kernel caches a lot of data, which makes subsequent runs much faster if you don't flush the caches. And that is not what I wanted to measure.







msg/sqlite tx
msg/xapian tx
synchronous sqlite
temp store sqlite
time (s)
notes
1
1
full
file
1536
1
1
normal
default
182
similar to defaults for mu 0.1, but faster
100
1000
full
file
73
100
1000
no
file
68
100
1000
no
memory
68
default for mu 0.3
10000
10000
no
memory
67

As an example, the default for mu version 0.3 is equivalent to:
./mu-index --tune-sqlite-transaction-size=100 --tune-xapian-transaction-size=1000  --tune-synchronous=0 --tune-temp-store=2 ~/data/testmaildir
Again, see the mu-index(1) manpage for details.

Note, these optimizations are a good strategy for indexing data, that is, generating data from data that is already safely stored somewhere else. If anything goes wrong, we can always restart the indexing later. However, if your database stores data that cannot easily be retrieved again afterwards (say, that one occurrence of the Higg's Boson in your particle accelerator), you would want to be a bit more careful.


There are some more optimizations possible; some I have even implemented, such as inode-sorting, which is documented in the mu-index(1) man page. To be discussed some other time.

Syndicated 2008-10-18 14:06:00 (Updated 2008-10-19 16:39:41) from djcb

it's all greek to me

It's been a while since my last blog entry... I haven't done much work on modest lately, but it is in safe hands. I did start a new little hobby project though; it's called mu, and it's a collection of command line tools to index / search e-mails stored in Maildirs. It doesn't run on N8x0 (yet), but I guess it wouldn't be very hard to port it. Of course, this kind of software has been written before - but for a hobby project, that does not really matter. It's all about trying things out.

I am taking notes about the things I learn as I go along... there's a lot of optimization stuff to discuss but unfortunately, it's too much to fit into this blog entry... will write about that later. I am off to Greece now -- to corrupt the youth of Athens. I hope I can understand the people; I taught myself a little bit, but rumours have it that the language has changed quite a bit in the last 2500 years...

And not to forget: happy birthday, GNU. 25 years... I may not always agree with RMS, but he deserves the greatest respect for his accomplishments. A George Bernard Shaw quote comes to mind:


"Reasonable people adapt themselves to the world. Unreasonable people attempt to adapt the world to themselves. All progress, therefore, depends on unreasonable people.".

Syndicated 2008-09-28 08:45:00 (Updated 2008-09-28 09:30:02) from djcb

my name is nobody

It's great to see the improvements in the modest e-mail client. For most people, there should be little reason still to use the old email client. Great thanks to all involved -- my friends from Spain, Belgium, Germany and elsewhere; Vivek, Mox, and all users, contributors etc. It's good to mention contributors sometimes; I found the CNN Money-article about the N810 development team a bit off-balance in that respect - it would have been nice to include some people who write the software, too.

Anyway, back to modest. I'm sure that someone, somewhere is missing some feature that is essential to them. While usability and feature-richness are not necessarily conflicting, in practice they often are (I know, I'm a mutt-user!) But, no excuses -- in my (slightly biased) opinion, it's a nice little e-mail client and a great improvement. And with the code being open and free, there is nothing stopping people from firing up their favorite text editor and start hacking on their missing pet feature.

My personal role in the modest-project will diminish a bit. I'll be slaying some new dragons - still Nokia, open source, yadayada; I'll write a bit more about that in the near future. I feel that modest will continue its life in trusted hands, and of course I'll keep an eye on that ;-)

Syndicated 2008-05-18 10:13:00 (Updated 2008-05-18 10:57:41) from djcb

the thing that should not be

Just a short note: due to an unfortunate regression, Modest (version: W16 release) does not work with SSL/TLS, breaking providers such as Gmail. See bug 3084. The reason was that what we tested with, differs slightly with the Chinook environment, and so this one fell through the cracks. Mea culpa... Anyhow, the problem has been fixed. If you build things yourself, get the latest (tinymail and modest) and all will work fine. If you don't want to do that, you'll have to wait until Monday; unfortunately, we can't do anything before that.

Once more, apologies from the Modest team for the inconvenience.

Syndicated 2008-04-18 13:53:00 (Updated 2008-04-18 14:04:49) from djcb

images and words


These are interesting times... I just found out that the next revision of our internet tablet will have WiMAX-support. Rest assured - modest will support that as well.

Also, recently I have been studying the (very much recommended!) work of Edward Tufte, on the visualization of data, and how modern technology is great at obfuscating real meaning behind snappy graphs. Still people are trying to generate meaningful (or sometimes just pretty) pictures out of masses of data. One of the masses of data being email messages. Check these great post on FlowingData which show many different visualization of email data. For a more practical example, look at MailTrends, which analyzes the emails in your Gmail account for your.

et tu, emacs?


I was very happy to see prebuild Emacs packages for Maemo. I wonder if my instructions are still valid, especially regarding key-bindings on the N810. Anyway, I'd be interested in the next steps in integration Emacs with the platform. I'd like to connect the HW-zoom buttons to zooming the fonts in Emacs, and maybe marry the emacs-server setup with the application menu -- ie., don't use new emacs instances for new files, but instead use new buffers in the existing instance. Now all I need is a little time.

Syndicated 2008-03-31 21:31:00 (Updated 2008-04-01 20:06:59) from djcb

recreation day


Photo from Evergrey-concert, yesterday 20.03. Excellent music from the Swedish rockers; I've known them for years, but this was the first time I saw them live. Great concert, very talented band, and they were nice enough to do an autograph session afterwards; I even got into a picture with the guys -- slightly embarrassing...

Time for an update... Our beloved modest e-mail client is humming along happily. We're putting a lot of energy of testing all kinds of use cases, as well as weird error conditions. Modest/tinymail contain quite some code (in total around 240K lines), so there are a lot of things to test. Anyway, I was already quite happy with our first bèta release, back in December. And modest has seen solid and consistent improvement since, every single week (with a few regressions thrown in to keep things interesting...).

Also, I have been quite happy with my emacs-on-N810. It has turned my N810 is a versatile PDA. I'm slowly capturing the power of org-mode in Emacs (see the 25 minute video), which is an amazing way to handle todo-lists, GTD and so on.

Then, there is so much happening in free software land, it's hard to keep track of it, even if just looking at the level of fundamental tools. Some things that I found quite interesting:


  • Dehydra/GCC is a plugin for gcc (Javascript!) built within the context of the Mozilla project. Mozilla uses an object system called XPCOM, which is 'inspired' by Microsoft's COM. However, times have changed, and in many place in the huge Mozilla codebase, this XPCom is seen unnecessary bloat and complexity. For example, in COM-style, one uses the return value of a method for error checking; the 'real' return value comes as an outparam. However, in many cases (see DeCOMtamination), it's much better to use a normal return value, and use e.g. exceptions for error handling. Now, try to do that automatically, taking into account possibly misuse of outparams -- sometimes, sed/awk/perl are just not enough. And that is where DeHydra comes in.
  • gold, the new & improved GNU linker. It's good to see that even classic tools like ld are still being improved -- and quite significantly in this case, esp. for speed. What we're still waiting for is link-time optimization, which can significantly speed-up programs, e.g. by making sure the most used functions are in the same memory page.
  • quagmire (giggedigig!), finally an autotools-replacement projects that seems actually capable of doing so. The initial goals is to replace automake and libtool with a bunch of GNU make macros. By simply requiring GNU make instead of a 'normal' make, a lot of the hackery autotools disappears. Another nice thing is that it understands pkg-config, which simplifies another set of problems. Apparently, the longer term goal is to replace autoconf as well. And, given designer Tom Tromey's track record, the future looks bright.

Syndicated 2008-03-21 14:37:00 (Updated 2008-03-21 17:18:58) from djcb

come together


I've returned to Helsinki after visiting the FOSDEM-conference in Brussels. Before anything else, I'd like to thank and compliment the organizers for creating a great conference. And not just the organizers, all the volunteers that made it another great FOSDEM. The amount of work that goes into something like this cannot be overestimated, and it all went very smooth. If anything, it was too succesful, so many people...

Anyway, I had the chance to meet a great many old friends, as well make new ones. It's fantastic to see all the free software projects that improve things all over the software stack. Kernel, console, X, web services, funky UI bling, end-user applications, embedded software,... So much combined brain power, pushing the envelope of free software.

I did a presentation (should be available soon) of our own little addition to that, the modest e-mail client. Although there was some delay (Murphy!), I was quite happy with my talk. And it was particularly interesting to talk to modest users - what do they like, what do they miss, and so on. Overall, we've been blessed with very helpful users, and with there assistance, we were able to kill quite a number of bugs which would have very hard to fix otherwise. See our resolved buglist in our bugzilla for some great examples of that.

Anyway, overall FOSDEM made me quite happy -- such a gathering of smart people and great software, promising a lot of good things for the free software future.

Note, screenshot is of the unstable, proof-of-concept GNOME desktop-version of modest, courtesy of dape.

Syndicated 2008-02-25 20:34:00 (Updated 2008-02-25 20:44:10) from djcb

heeding the call

This coming weekend, a big part of our multinational modest-team (Sergio, Berto, Philip and Dirk) will be at the FOSDEM conference in Brussels. There will even be a presentation about the modest e-mail client. So, if you have any questions, suggestions or even some constructive(!) criticism, this is your chance! You can use IM if you're looking for me (diggler[at]gmail.com).

Last year, I had a great time at FOSDEM, with many, many interesting people as well as the nice atmosphere (food and drinks) in Brussels. Hope to see many of you there!

Syndicated 2008-02-21 16:16:00 (Updated 2008-02-21 16:16:40) from djcb

waiting for 22


Achilles had Patroclus. Don Quixote had Sancho Panza. Michael Jackson has Bubbles. And I have emacs. On my N810. A while ago, I already wrote about it. I even showed some screenshot of emacs running in scratchbox. But, I didn't take the final step - getting it to run on an actual N810. Recently, I tried to get that to work. Well, that was frustrating... Some hackish instructions follow. They may or may not work for you -- try at your own risk :)

  • First I tried to simply rebuild the Emacs23-packages in scratchbox; that failed, because the compilation somehow crashes QEMU;
  • Then I tried Emacs 22 instead... but the problem remained;
  • So, I decided that maybe I should try to compile the package it on the N810 itself. Again, that failed. One of many problems: package building requires a real grep, and if you try to install it, it wants to remove the whole busybox environment; sigh. I fought the system - the system won...

But, all was not lost. There are prebuilt Debian packages of Emacs22 available; I took the armel-packages from there, and tried to install them. That almost worked. Almost, because the size of emacs is almost legendary. It did not fit on my root file system on the N810.

We're nearing the solution though; I copied the contents of the .debs to a directory emacs810 (with mc); then I copied this directory to the MMC-card of the N810 (/media/mmc2/). I set some symlinks, ie.


# ln -s /media/mmc2/emacs810/usr/share/emacs /usr/share/emacs
# ln -s /media/mmc2/emacs810/usr/share/emacs22 /usr/share/emacs22
# ln -s /media/mmc2/emacs810/usr/bin/emacs22-gtk /usr/bin/emacs22
# ln -s /media/mmc2/emacs810/usr/share/applications/emacs22.desktop /usr/share/applications/hildon/

Also, you'll need to install libungif4g. That should do the trick, and emacs should show up in your Extras-menu. And we can run emacs! Victory is mine!

Well, almost. In emacs, a very useful key is the Meta-key, usually mapped to the Alt-key of your keyboard. But of course, there's no alt-key on the N810-keyboard. Instead, I decided to remap the Chr-key. I'd like to remap it in my .emacs, but I haven't been able to do so.

Anyway, as a first start, I added these to /usr/share/X11/xkb/symbols/nokia_vndr/rx-44 (the xkb keyboard mapping):


key <SPCE> { [ space, space, Tab, space ] };
key <COMP> { [ Meta_L, Meta_L, Multi_key, Meta_L ] };

modifier_map Mod1 { Meta_L };

The first one will turn Fn-Spc into Tab, which is very useful for completion (for some reason, I couldn't get M-i working in the minibuffer). The second one will turn the Chr key into the M-key (obviously, you can't run emacs without that), with Fn-Chr giving the old Chr key. Not sure what it will break - it's black magic.

Ok, that's it. These steps should be cleaned-up, pre-packaged and made single-click-available. Anyway, the steps above should hopefully get you a working hand held emacs. Happy hacking!

Syndicated 2008-02-18 21:01:00 (Updated 2008-02-18 21:40:45) from djcb

148 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!