Older blog entries for jmason (starting at number 62)

Long time no post; I've been posting everything to my weblog for quite a while instead. ;)

However, I'm back, for 1 posting only; this weblog posting details how we don't need PKI to do an anti-spam web of trust. Advo is the One True Home of WoT schemes, everyone knows that ;), hence I'm posting a link here. But I'd really love it if some of the WoT gurus could do some brain-dumping on how distribution of a WoT could work...

I just posted this on taint.org, but it's the kind of thing I usually post to this Advo diary instead (since it deals with code and free software). So I'll post it here too ;)

<bigwig> is a really interesting new design for web services. A month or 2 ago, I was thinking about web app languages, like perl/CGI, PHP, servlets, HTML::Mason, etc., and I realised that the big problem was the requirement imposed by the web environment itself; most "interesting" operations often have a UI that needs to take place over several pages, and each page has to

  • unmarshal the user's CGI params, decode them, check them for insecurity, validity etc.;
  • open the database;
  • perform actions;
  • fill out the HTML template (I'm assuming nobody's insane enough to still use embedded HTML-in-code!);
  • insert "next step" form data in that template;
  • send that back to the user;
  • save a little state to the database;
  • then exit, and forget all in-memory state.

When compared to most interactive programs now, it's clear that this is a totally different, and much more laborious, way to write code. The nearest thing in trad apps is the "callback" way to deal with non-blocking I/O, ie. what we used before we could (a) use threads (b) use processes or (c) wrap it up in a more friendly library to do that. It just screams complexity.

<bigwig> fixes that:

Rather than producing a single HTML page and then terminating as CGI scripts or Servlets, each session thread may involve multiple client interactions while maintaining data that is local to that thread.

They call it The Session-Centered Approach.

It gets better. They also include built-in support for input validation, HTML output validation, compilation and compile-time code checking, and it's GPLed free software. This is really good stuff. Next time I have to write a web app, I'll be using this.

Found via sweetcode.

Doc quotes:

The great teacher John Taylor Gatto said this about how he learned to truly teach:

I dropped the idea that I was an expert, whose job it was to fill the little heads with my expertise, and began to explore how I could remove those obstacles that prevented the inherent genius of children from gathering itself.

s/children/users/, for all those people who take the BOFH stories a bit literally.

10 Dec 2001 (updated 10 Dec 2001 at 02:40 UTC) »
Replies:

thom: travelling? I was in Sydney for ~6 months, went to Melbourne for two weeks, stayed in Byron for 3 days, then stopped in Brisbane for 7 weeks... Now I'm off back to Sydney... Some traveller, me.

I can top that. Thailand, 3 weeks, Melbourne, er, 4 months ;). I am planning to get moving again soon though, otherwise my Official Traveller Certification will be rendered null and void. Also I'll be thoroughly pissed off if I don't get to try out my new diving sk1llz on the Great Barrier Reef while I'm over here.

gary: Saw Chris Raettig's journal the other day and was impressed; his rationale for an email based journal is appealing [..].

I use a mail-based system to update taint.org, my (other) blog. It works nicely -- the main point was to see how useful WebMake would be for blogging -- but I'm now reverting to using Advogato for diary stuff, keeping taint.org for interesting newsy snippets. Go figure. I reckon it's because Advogato's more of a diary-based community, whereas taint.org is kinda out on its own, and just doesn't seem like a good place to keep a proper journal.

ask: Jabber [..] Maybe it's just crappy clients, but it doesn't seem stable enough to run anything too critical on.

That's the same feeling I got about 6 months ago. :( Let's hope it polishes up some time soon. Open source != crappy packaging!

MailMan-to-RSS:

Oh -- another new hack I forgot to mention, which someone might find useful. MailMan-to-RSS, a little script (and a vaguely nice-looking demo site) which scrapes MailMan list archives and creates an RSS feed of the last 10 posted messages. Handy for following lists using any of the various portal systems (and Evolution!) that support RSS.

BTW as the page sez, if you would like to see a list RSS-ized, and don't have a server to host the RSS on, mail me and I'd be happy to add it to the scraped list on taint.org.

Oh look, thom is travelling 'round Oz too! cool. Must get back to the "travelling" part myself someday soon though.

BTW forgot to plug the Sub-Pixel Font Positioning on UNIX mini-HOWTO I wrote a few weeks ago. It really just ties up the loose ends in XFree86 4 doco, covers the implementation details, and gives UNIX-users something to point at when Windoze lusers rattle on about ClearType. ;) Sub-pixel positioning is a total beaut tweak for a laptop screen.

Kevin is talking about kittens. Why is everyone going on about cats this week? I miss my cat :(

I'm travelling 'round the world at the mo', and apparently the one thing people travelling always miss, is their pets. Statistically, I've backed this up with several drunken conversations with other pet-craving travellers.

Patches and Contributed Code

Here's an interesting one. I've written a few free-software apps in the past, and recently SpamAssassin has taken off. It's very much sysadmin-oriented, being a mail filter for spam which works well as a system-wide filter.

It's illustrated that there's a big difference in audiences, between app users and sysadmins; sysadmins will regularly hack the code to ''scratch their itch'' and send back a patch; whereas patches don't often come from users.

Interesting...

My ghod, it's been a while since I updated the diary. Things I've done since then:

  • wrote SpamAssassin, a mail filter to identify spam using text analysis. Using its rule base, it runs a wide range of heuristic tests on mail headers and body text to identify spam.

    This is pretty neat. It does a good job of differentiating spam from not-spam without too many false positives or negatives; and it's a proper Perl module, so it can be plugged into other mail delivery or filtering systems quite easily (at some stage ;).

    I've been using something similar for a long time, but I eventually decided to reinvent the wheel. The end result is pretty good so IMHO it was worth it.

  • Helped start up Ireland Offline, a new organisation campaigning to sort out Ireland's internet backwater status and bring fat pipes to the people. This is going well... lots of interest, press and support, and some great people involved.

  • Decided to move to Australia ;) Yep, despite getting involved in Ireland Offline, I'm heading off to Melbourne in a month's time. Haven't really figured out the job situation there, but hopefully it shouldn't be too tricky getting hold of one. If anyone reading is in a position to hire a UNIX guru (hey, I'm allowed to plug myself for this), give us a mail.

  • Sitescooper: not an awful lot of news here; Plucker support is pretty good now, and I've put its caching subsystem on a diet in preparation for a move to a new server for the Nightly Scoops site.

    The scoops page is an interesting situation. Every night, a cron job runs off and downloads pages from 136 sites (typically the ones that have clear-ish terms allowing redistribution of their content). The sitescooper script is run 5 times, for the 5 output formats that site provides. Since sitescooper caches these pages in a per-format cache (which allows it to run diffs on pages to see what's changed) as well as a shared cache (which ensures the network is only accessed once for each page), that was 6 copies of each page.

    The cache is expired every few days, removing pages older than a month or so. Still, it was running pretty big, all the same. I've now implemented a Singleton pattern for the cache usage, which brings it down to 1 ref-counted copy of each page, and 6 pointers. After a few weeks of this, the cache disk usage is running at about 120 megs, down from about 800.

    This unfortunately may still be too much for the poor overburdened colocated server I use, especially since I'll be on the other side of the world. :( As a result the list of sites on the page may need another diet. We'll see...

  • WebMake: lots of new stuff in the pipeline. It now supports plugins, which are library files that can define library functions for the inline perl code, and -- since I've added tag-definition support -- a plugin can also add new tags, for use either in the HTML input documents, or in the WebMake .wmk XML file itself. Who needs taglibs? ;)

    This has allowed lots of new features, without messing up the core. It's been in the released version for a while.

    However, a new new feature, not released yet, is IMHO neater. It's "edit-in-browser" support, which is long overdue.

    This is really just a CGI script and a set of modules, allowing a WebMake site to be managed in a web browser; the user logs in using traditional htpasswd authentication, picks a WebMake site (ie. a .wmk file), and can then pick bits of content from the file and edit them in a textbox. It also has a directory browser/file manager for the tags that load content from a directory tree, like contents and media.

    Once they're done editing, they can build the site (using WebMake, obviously), and -- the really neat bit -- check their changes into CVS.

    Since CVS support is built-in, this means that I can update my sites from anywhere in the world, with a web browser, or do it quickly at the command-line from anywhere I have the sites checked out -- at home, in work, etc. It also gives a bonus in that it makes site replication super-easy -- just cvs checkout and it's done. And it's free. CVS is cool.

    So I'm just documenting this up, grabbing screenshots etc., and then I'll release it.

Just certified Dave Brownell as a Master, seeing as he's one of those guys who just keeps cropping up in the most interesting projects.

Still need to do a proper diary update at some stage...

Sitescooper 3.0.2 released -- and about time too ;)

53 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!