Older blog entries for gobry (starting at number 27)

replication, now (really!)

A long time ago, I wished I could have all my personal data (addressbook essentially) available in read-write mode from any of my work places (linux laptop, home iMac, linux at work), and also shared with my wife.

As this remained a itch to scratch too long, I finally decided to see how far I could solve it myself. So I've taken replication 101 (mostly followed references from the interesting white paper from the unison project), and experimented a bit in Python with simple ideas.

The result fits my needs (I can read and write from several places the system handles propagation and updates, and reports conflicts), but is still far from either complete (I still need to put some sugar on the conflict resolution procedure, to finish the Addressbook.app client,...) or polished (the implementation is certainly not space nor time efficient)

At least I feel better now :-)

BTW, if you want to play with it, it's available here:


It will probably randomly discard your data, crash your network and repaint your bedroom, but if you wish to test it, feel free.

Haskell is driving me nuts. I really like its expressiveness, but lately I had a problem: my short program (a log parser) which used to work with constant memory footprint (thanks to good advices regarding strict data types), started to suck up more than 100Mb again. The culprit seems to be my introduction of a unit test in the module, which steps on the toes of the optimizer. Am I just unlucky, or is any haskell programmer supposed to understand in deep details how the compiler optimizes one's code?

In the meantime, I also digged a bit in the bibliography regarding replication techniques for disconnected devices. So far, I'm looking in the direction of Harmony and Rumor.

17 Dec 2004 (updated 17 Dec 2004 at 09:03 UTC) »
e8johan: this is almost covered up by standard HTTP headers (you can ask for a page if it has been changed since a given date, or check its etag). But indeed, compared to NNTP for instance, it's still not very scalable.

dcoombs: while you're already trained for slow execution, why don't you feed your program to valgrind? it's the tool that saved the day when I was still programming in C/C++... :-) And given what it actually does, it's not _that_ slow.

Job still working on a nice project based on CDSware. It's a good document management platform which is now mainly in Python but with some remaining parts in PHP. The team has a really interesting sensibility regarding high level languages (not only python, but also in the functional family), which helps in thinking in terms of "the right tool for the job", and not in terms of "the hype of the day". They managed to get very good performance in searching almost 1M documents, with complex queries running in less that 1s, by using boolean vectors from Numerical python, serialized in a MySQL database.

Released version 1.0.4 of Garlic. From the outside, it's another web based bookmark managed, but beside being a personal itch I had to scratch, it also serves as a testbed for several things:

- Pybliographer 1.3: this branch uses a bsddb backend to store the data, and has its own file format for exchange. So garlic is useful to test if someone would actually like to code against pyblio-1.3, and if the code works in situations it was not especially designed to handle.

- twisted: in this version of garlic, a companion application is able to parse RSS feeds and insert them in the bookmark manager (in a dedicated folder), via twisted's RPC system. Other features might be added in a similar way, it's just that I really wanted this one for my personal use :-)

- quixote: a simple yet elegant way to generate html code in very natural python. I tested nevow a while ago (nevow is the "official" web templating solution on twisted), but it was evolving too fast at that time for someone that was also learning twisted :-) I'll give it another try soon, as it is really really elegant.

Another area I'd like to explore is some high-level testing framework for GUIs (esp. python-gtk). Certainly a tricky issue, but I really saw no higher-level approach than using X to simulate events. Of course a higher level means probably limiting oneself to a given toolkit. But using signals and playing with the widget tree seems to offer more power.

Don't know how I could mix that into garlic however :-]

forrest: I've updated the entry on the site. There is a moderation process, so the modified entry is not yet displayed.
Hacking I now use MacOSX as my primary platform as I still didn't replace my laptop, and I could not connect the nice new hard drive I bought in order to have a Debian partition... My Firewire controller must have melted or something, I cannot connect any peripheral to it anymore. So my disk is connected via USB 1, and is used as a backup (rsyncX to a HFS+ partition) and Arch revision library (UFS, as I need case sensitivity). This story is getting more and more expensive (broken laptop, broken firewire which seem to be part of the motherboard, additional external drive...)

BTW, I've now compiled almost everything I need via fink (and even got hooked to become maintainer of the recode module which is broken out of the box for my work on pybliographer). If no other piece of hardware dies in the following days, I might even be able to work on actual stuff...

The laptop is really dead, but so far I haven't replaced it. This gives me the opportunity to check if MacOSX is a complete replacement for my needs. So far, I've mostly been annoyed by the case-insensitive filesystem (completely forgot that issue until arch spitted some error message, due to a filename clash in a patch), and by some packages that don't compile out-of-the-box. I wish I could resize a HFS+ partition without loosing its content, in order to have some room for a Debian partition...

New job I've found a very attractive job opportunity, involving python, opensource software and interaction with many interesting people. More details to come.


Just dropped my laptop (an old Compaq Presario) on the floor... The screen now remains snow white, but the rest seems to have survived. At least I'll be able to extract its content. I store in arch or backup almost everything I use regularely, but one never knows... I'll try to see if there is some hope for the screen tomorrow, I don't think it's a good idea to open it just before going to bed...

I just hate buying unnecessary expensive stuff, not because I don't appreciate new toys, but rather because I really don't like making myself the target of the many marketing tools that come with these kinds of activities.

Ok, so does anyone have a good suggestion for a linux laptop? no need for high performances, rather sth light, silent, and robust...

a new release on the stable branch, no big surprises. Plans are quite clear for an early preview of the development code, now I just need to find some time to actually put some code in it :-)

regarding arch being more centralized than subversion, it's rather the opposite :-) First, unless Gna! forces you to do so, more than one person can commit to an arch repository. The development pattern you refer to is the one where there is an "official" tree, which collects patches from other trees. Note that anyone can start a mirror of your tree, or a branch with local fixes which is kept in sync with the official tree. The difference with subversion is that these people can also see their changes be merged in the official version with no problem (ie they won't see their patch be applied a second time to their own branch when they update from the official tree for instance).

So you can imagine multiple topologies depending on your needs. For instance if you have many small contributors and a few more regular developers, you might imagine having a first level of integrators that take the contributions, and you only merge from those.

3 Mar 2004 (updated 3 Mar 2004 at 14:40 UTC) »
Pybliographer: the mailing list is now available via NNTP thanks to Gmane. I didn't know this service until someone asked me if it could be used for pyblio. Nifty.

Work: struggling with cygwin and mingw to bring our company build tools (intended for embedded development) on an XP box... The tools themselves are in Python, which gives me at least some hope, but it's messy anyway.

PHP: even if I dislike the language, I managed to merge Wordpress and Gallery in the same web site, with a common visual appearence. Hopefully I won't have too many changes to do, as I feel like walking on eggs (what's the idiomatic english equivalent of this?)

18 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!