Older blog entries for nomis (starting at number 8)

23 Aug 2002 (updated 23 Aug 2002 at 21:16 UTC) »
spam - multi language problems

The graham/bayes approach to spam is interesting and seems to work quite well. However, it seems to have pretty major issues with multi-language mails and I am not sure how to fix this in a convenient manner.

I get lots of "good" english and german Mail, but there is by far more english spam than german spam in my inbox. This has the effect that a word that should appear in nearly every german mail like e.g. "ein" appears rarely in spam mails and more frequently in good mails. Suddenly a word that should behave neutral for detecting spam becomes a witness for a good mail. In the case of "ein" the spam probability is 0.05 in my database.

It is not that bad because I do not get too much german spam. However, it seems like a fundamental problem to me and it most probably cannot be adressed without different databases and a way to determine what language a mail contains (this most probably can work the same way as distinguishing between spam/nonspam). However, the training/sorting work would increase significantly - I usually don't sort my mails by language...

On the other hand the very same effect is useful for me with CJK-Mails - I don't speak any of these languages so there are no "good" CJK-Mails in my inbox. It is perfectly reasonable that the filter classifies them as spam...

Hi everybody. Having just watched Episode II I want to point all GUAD3C visitors who had a chance to visit Sevilla to two photos:

Have fun :-)

I did a good thing. I made The GIMP libart dependant. Now the conversion from a polygon to a selection really gives great results. They sucked before (left side).

Now let's see, where we can use it for other fun stuff...

As I am interested in typography I could not resist to reply to raph's challenge with his lebe-Font.

From some sheets I have seen the most likely letters missing on your specimen sheet are: J, U, w and maybe K.

However, these letters look great in your font. You certainly did a good job. The only letter I think does not match fully is the U, which should IMHO have a more circular arch (not sure about the english terminology) and be a little bit less slanted.

And maybe the stem pointing to the upper right of the K should be more uniform - most of the other letters have quite uniform stems.

Anyway: I like your font very much.

Long time no diary... However, a reference in raph's diary reminded me to publish something i did some weeks ago. I had a look at ClearType and was disappointed too. So I wrote a small script-fu for Gimp which scales an image down, optimized for LC-Displays. However, this is just a raw sketch and sometimes has some colorful borders, but the results are definitely worth a look.

Gimpcon is over. Thanks to Sven and Mitch for organizing this great event.

I try to certify all people I've met from face to face. So I had to certify a lot of gimp-developers. All of them are really nice people :-)

The meeting was very constructive. Gimp 2.0 will really rock!

I am desperately searching for people, who can help me at the Gimp booth at the biggest Linux fair in Europe, the LinuxTag in Stuttgart. I got some sponsors for our booth so I am most likely able to provide flight and room for one person from US or maybe two persons from Europe. The most urgent need is at the weekend (1st and 2nd July 2000). Our booth will have two or three computers, lots of wacom tablets and maybe a projector. Probably a lot of people want to learn something about Gimp :-)

If you are a Gimp Power-User or a Gimp-Developer please mail me, you could help me a lot.

Braunschweiger Linuxtage last weekend. Great event with a lot of interesting people. Met Martin Baulig who held two talks about Gnome and Tim Janik who made a presentation about GTK+ 1.4. Robert J. Chassell from the FSF was there too and talked a lot about Free Software (of course) and society. Very interesting and entertaining.

I nearly talked the whole two days about Gimp. The local LUG had a booth with Gimp on a Xinerama three monitor display and two Wacom tablets. It was sometimes hard to find the mousepointer, but it worked... :-)

Met lots of other interesting people. Especially thanks to Bjørn Bürger and his WG for providing a place to sleep.

Last Weekend I visited the first Vintage Computer Festival Europe in Munich. It was a pretty cool event with lots of interesting old hardware. My favourites were:

  • A Tektronix Terminal with a storage vektor display
  • A mechanical calculation machine with motor. Division really rocks. Make sure you place it on a stable table :-)
  • A self-built tube computer - the MUNIAC. The creator tried to build a "modern" tube-computer. It is not yet completed but is already able to count...

It is some kind of frustrating, that my knowledge of old computers is really limited. At least I can now claim, that I wrote a program on the very first Commodore PET 2001 model - Yeah, right:

10 PRINT "ICH BIN EIN COMMODORE PET 2001! ";
20 GOTO 10

Definitely cool!

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!