Older blog entries for ingvar (starting at number 295)

Been a while since the last post. Work has been hectic, what with having to battle through the amendments to my pre- takeover contract into my new post-takeover contract. Mostly it seemed to be down to the legal department just not getting FOSS and once the whole "he's doing this for fun?" clicked, they didn't seem to have a problem anymore.

Two papers finished off, both declined to Conference #1, but now submitted for the consideration of Conference #2.

Two essays, on data structures and time complexity and electronic fora finished off.

OK, as an addendum to my previous post, I ended up screen- scraping what I needed, parsed the data I wanted out of it and generated SQL statements to (later) populate a database with. It would probably have been more elegant to connect to the database and insert the data directly, but a FORMAT call is quite convenient, as it were.

The screen-scraper was constructed by using DRAKMA to fetch the pages and then some substring functions to extract the data I needed. Estimated 30 minutes of coding lisp and testing, then a further "lots" of actual scraping.

But, my main musing for today is something I've noticed recently, in my Apache logs. It seems as if there's an active business in "referring page" spam. I haven't run the numbers, but from eyeballing the logs, I am seeing at least a couple of page fetches per day, where the "referring page" field is several URLs that trigger my wetware "this is spam" detection. I wonder what the reasoning behind it is? Maybe they're banking on sites publishing their stats publicly?

Border-line silly question. Is there an easily-navigated (or searchable) repository of vulnerability reports that can list things in a time-span? Last time, I ended up going through the BugTraq mailing list archive, but if someone has already collated specific vulnerabilities by "first reported date", it'd make things slightly easier for me.

Looks as if SecurityFocus have the raw data, but (alas) no obvious navigational features to let me do what I want.

Yes, it's that time, again. Snooper Annual Report! MUCH less than a year since last, but probably about a year before the next time it gets done. This time, it also spans exactly one calendar year and overlaps slightly with the tail end of the last report's interval.

Recently (as in the last couple of years, not as in the last few weeks), publicly available Common Lisp libraries have undergone not only an explosion in numbers, but a rather bizarre change in release model. More and more libraries are essentially only available as "check out the latest version from VersionControlSystemOfChoice".

Since I am a writer of assorted nonsense, I wrote a short piece on this, trying to articulate why I find this less than ideal and how it could, possibly, be turned from less-than- ideal to much better.

Personally, I try to release my own stuff in versioned tarballs, with an ASDF system definition having a matching version number. I suspect I should modify my release packager script to actually modify a list of stuff available, instead of having a couple of static pages I almost never edit (note: the packaging script makes some rather rash assumptions on the organisation of your source code and relies on a couple of magic files being up-to- date; your source code is probably not organised like mine is)

It seems as if NOCtool keeps spawning side projects. I'm currently in the early stages of another support library for it (more details when the code is closer to "usable").

In other news, twitter syndication makes me unhappy. Especially when it goes Twitter, to LiveJournal, then on to Advogato. If I wanted to read them, I would've been on Twitter already.

Useless text-coding idea #n (but a sit is cute, I shall ignore this and pretend it actually has some use).

Imagine a text, where each word is viewed as an integer. By, for example, splitting the text at whitespaces (this would leave punctuation as being parts of words, but this is not a critical problem). We can then convert each word to an octet vector (by, say, using UTF-8 encoding, since that seems so popular, these days). This octet vector, in turn, can be viewed as either a big-endian or little-endian 8n-bit unsigned integer.

Being an integer, it can be decomposed into its prime factors and these can them be emitted in some suitable order, using some simple framing protocol (using, say, 16- bit "prime ordinal", using 0 as a delimiter).

Obvsiouly, this restricts you from expressing words that happen to be prime, unless they're within the first 65535 primes and I haven't actually run any tests on this, to see how it seems to work out on actual test data. But other than being useless, I think it has cuteness potential.

Seems as if the lates addition to the spam-controls has done something for the comments section. Out of 128 attempts to post, only 10 resulted in a user-visible comment. Unfortunately, all 10 of those were spam and I have no idea how many of the others weren't. Currently doodling on a mod-queue system, so I can actually observe these things in a bit more "what gets trapped, what doesn't" fashion and actually allow me to experiment and tune the predictors.

Slapped in some marginal spam protection on my essays site about two weeks ago. It worked, briefly, but as all it did was verify that there was an MX or an A record for the submitted email address (and otherwise silently discarding the submission, as all failed "you are not spam" checks lead to), I have now instituted another.

Alas, I was a bit over-eager with the cleanup this time and I am considering putting up a static page with "commercial content archiving policy" stating that, yes, you can post spam and have it stored, at a cost of UKP 10 per day, per message. This is, of course, a more modern version of my spam archiving policy from the mid-90s (no, I received neither apologies nor money, but I did have great fun sending out a couple of emails asking for it, referencing said web page; I also know that it was adapted by some other people).

You know? I suddenly realised what I consider important in a mailing list software (with moderation, filters and other goodies). The ability to transparently impose administrative policy on multiple lists lists in one go.

Background would be "I have several Mailman lists on common- lisp.net" (where "several" is, I believe, 8) and they do, of course, get spam. But, I essentially do NOT use any of Mailman's spam-fighting measures (there's a few "just drop" non-subscriber from-address regexps and a "moderate all non- subscriber posts") because while all the lists get spam, any measure needs to be duplicated to all the lists to be worth the effort and that is WAY too much like actual work.

Of course, as these things go, I fully expect that someone will tell me "install the Blahonga module and you get it" or "buh, all you need to to is to frob around in the config and it's there!" and that would be fine, too. Until then, I will keep deleting 1-5 spam emails from 2-6 lists daily (no, not all lists are hit in parallel, some are also more attractive than others, it's all probably worthy of having been data-collected and charted).

4 Nov 2008 (updated 4 Nov 2008 at 15:04 UTC) »

Just had a look through the comments that had been posted to my essays site and disappointingly, every single one was a spam attempt.

I'll probably have to go back through the web logs and see if there were any more attempts to post, so as to determine if my (rather weak) anti-spam measures actually work.

Some quick looking indicates "no, not at all". The measure I've taken is (essentially) to have a hidden field, initialised to an empty string, with a name that (ideally) should trick a screen scraper to fill it in and simply not file the comment if that is present.

286 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!