26 Nov 2001 thayer   » (Observer)

Ifile: ifile is a neat program for automatically filtering email into appropriate folders, such as spam. It's design seems to be specifically for mh/exmh, and slocal (which I use).

It seems to simply use word analysis without much regard for the semantic or syntactic structure of email. Email comes into slocal and is sorted based on ifile's simple database of folders to word occurences. It's smart enough to learn when you refile a piece of mail.

I was hoping it would identify spam for me, but having tried it for a couple of weeks, I'm turning it off. Unfortunately, it's not appropriate for the way I file my email. I can imagine situations where it's useful, but it's got several problems:

  • refile slightly busted. I get an occasional "Not able to open..." (I use nmh and mh-e emacs-mode, which might not all interact well enough)
  • treats words equally whether in the header or not
  • doesn't notice how slocal rules caused things to be filed. I'd like it to learn from the rules I have already. (Need to periodically run knowledge_base.mh for this reason.)
  • seems to pick up folders which weren't in my .folders, namely my OLD/inbox and ARCHIVE/inbox. (I auto migrate stuff over a month old, and then archive and compress after two months.
  • I'd like it to treat my manual refiles as extra important in it's database.

One day, I'd like to get around to making some minor adjustments and fixes/updates. It seems like parts of it should be a standard part of the unix toolkit, such as word frequency analysis. Here are some guidelines if you're considering using it:

  • Use it if you use inc instead of using slocal/maildelivery. All your email comes to one inbox and gets filed by you. It will do a pretty good job of guessing how you file.
  • Don't use it if you programmatically filter. ifile is good for human text, it won't save you much if you deal with highly structured email. You'll have to continue to write rules and ifile will often get confused.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!