Older blog entries for Waldo (starting at number 159)

I've been a PostNuke developer -- in the sense that I have CVS access, I'm on the developers' list, etc. -- for about a month. I'm yet to contribute a thing, and I think I've posted a grand total of once. It's not because I'm lazy, or disinterested. Instead, it's because I feel alienated, confused, overwhelmed and mis-matched. I'd just like to write some code, but I can't find any place to get started, and I don't want to get caught up in the mini-holy wars. The thought of another fork is incredibly frustrating, because it appears that a fork would be a consequence of personal and political issues, and not so much technical ones. I know that all of these things are normal and reasonable in any project -- I've been on the wrong end of this problem before, lord knows -- but it's serving as a real roadblock to me in this case.

I'm hoping that it will occur to somebody that the PostNuke team needs to break down into a whole mess of little teams. I'd like to squash bugs on the core code (that is, not the bazillion ridiculous little modules.) That's all that I want to do on the project right now. Maybe I'll figure out some way to do that.

I started class again this week. Just a Biology 101, Biology 101 Lab, and a U.S. Government class. (Can you say "prerequisites?") I have class three nights out of the week, in three-hour blocks, so there goes my prime programming time until December.
26 Aug 2002 (updated 26 Aug 2002 at 15:58 UTC) »
I got my 5GB iPod on Friday. Six hours later, my girlfriend's puppy ate the headphones.
Hackensaw Boys
Holy shit, Lolindrath, you've heard of the Hackensaw Boys? We're used to Local Boy Makes Good stories here in Charlottesville (Dave Matthews Band, among others), but the Hacks weren't a bunch I'd had pegged for that particular headline. :) Crazy.
I got my iPod in the mail today. I haven't gotten home to plug it into one of my Macs yet, but I love it. Very sexy. Wonderful presentation. Just in time for my trip to Boston, which I leave for tomorrow morning.
Digital Audio Content Authentication
As record labels become more aggressive about propagating false MP3s of songs (creating a file of the same size in bytes, the same length in minutes and seconds, with the same title, but contains garbage), it is inevitable that file-sharing networks come up with a method of fighting back.

I propose the use of a partial-match authentication system. I'll say right up front that I know virtually nothing about this concept, but I believe it's likely to be altogether achievable. Thus far, tracking of audio and images, Digimarc-style, has involved embedding a digital fingerprint in the file. This is good when the original creator of the information makes us of this, but when data has multiple points of origin (ie, many people ripping and sharing tracks), such a system is not of any use for purposes of authenticating the data.

Instead, it would be more desirable to derive a unique string for a song based not merely on the track length and the name, but the actual content of the song. If the track data as regards the actual music can be broken down into a short string of data, perhaps somewhere in the realm of 64 bytes, it will enable comparisons between tracks for purposes of determing whether or not they match. This is not any sort of a digital signature in the traditional sense, as it is never applied in the first place. It's simply an extraction of the data. We'll call this the authentication string. This string will need to be constructed in a manner such that two strings that are extremely similar are likely from extremely similar versions of the same audio file. A song that is encoded once at 192kbps and once at 128kbps should provide very similar authentication strings.

Now, this authentication string is not useful on its own. If Gnutella were modified to generate this data for every shared track, the information would be meaningless without a data source to compare it to. This is where a trust metric, of sorts, comes into play. Gnutella clients would generate this information for each track and, rather than storing it in an ID3 tag, store that data separately from the tracks. The servers (and perhaps the clients) would build up a database of the authentication strings for songs spotted on the network. This stateful database would track previously-spotted authentication strongs for an MP3, along with a voting-style system of currently-available MP3s, and perhaps even weight various authentication strings based on the total number of files shared by the owner and other, similar criteria. Whatever the nature of that trust metric, it would obviously have to be set up in a manner that would prevent the RIAA from poisoning the well.

I have no idea if somebody has already come up with a system like that. I obviously only know about the concepts behind this in the loosest of terms, so I'm not of any use in the development of a system of this nature. But I do expect that, short of some sort data-authentication system being put into place, file sharing systems will be spammed into oblivion by the recording industry.

I just returned from a week at the beach (Emerald Isle, NC, USA) a couple of days ago, and now I'm leaving for a weekend in Boston this Friday morning. It's a lot of driving, and doesn't allow me to get much work done on software and such. Hypothetically, I could do stuff between now and Friday, but by the time that I get caught up on regular ol' life things, it'll be time to leave again.
I went to the doctor today to get an ingrown toenail operated on. This one has been in a bad way since 1997. It's the third operation that I've had on my feet for ingrown toenails. The good news is that my big toes only have a single un-operated-on corner remaining, so I've only got one part left to get ingrown. :) Anyhow, it's quite tender, wrapped up quite thoroughly in bandages. It's on my left foot, making shifting gears on my motorcycle impossible, and so I am without transportation, save for the borrowed kind.
Paul Graham's Spam Plan
Like many others, I'm fond of Paul Graham's suggestions regarding what's to be done about spam. I particularly like the probability basis for ranking, as opposed to the arbitrary numbering system that SpamAssassin uses, though I'm fond of the overall concept of not ignoring the importance of legitimate-mail-recognition. I think that it would be good for somebody with more time than I right now to write a program that could take a mail file (mbox, IMAP, whatever) and run Paul's Bayesian filter on it to extract the hash tables that he describes. Then those hash tables could be sent back to him for analysis. I store all of my spam in a spam folder, as I have for years, so I suspect that data would prove useful. I also have a folder for mail marked "Family," which contains years of correspondence with my extended family. That would surely also prove useful in developing a decent image of what communications look like for people. If I could run a program that would quickly generate some files that I could send to Paul for analysis, I'd be happy to do so.
Writes sye:
Where's Waldo?

Right here.

Girlfriend, Meet Mac
My girlfriend, who I've gradually been converting into a Mac user for the past few years, has asked to borrow one for a while. She's currently running a rather-nice Dell and dual-booting between Windows 98 and Mandrake Linux. Now that she's finished with school, she mostly needs a computer for Internet access and digital photography. (Mostly pictures of friends and family and pets and such.) Now that she's seen iPhoto on Mac OS X, the deal's as good as sealed. I'm loaning her my Rev. A iMac (which I got the moment they came out, as a gift from a good friend.) I just wiped Yellow Dog Linux from that system last week and did a clean install of Mac OS X, the first time I've had a Mac sans OS 9. I think it will be perfect for her.
31 Jul 2002 (updated 31 Jul 2002 at 04:42 UTC) »
I got my replacement Asus A7V-266e in the mail today from Asus, 28 days after I express-mailed the previous one back to them. This is the replacement for the replacement, the third thus far. Maybe this one will work?


Tomorrow (well, in 30 minutes EST) I turn 24. I'm having a hard time even caring. I don't know why. I think it's because I'm in the beginning of a long chain of uninteresting birthdays. I guess I get to rent cars at 25, but that's not particularly interesting. It's not until 30 that there's any notable digit change. I've decided to celebrate the occasion by taking a flying lesson at the local airport, and then Amber and I will go up to a Dave Matthews Band concert in upstate Virginia, for which I have appropriately-fabulous tickets.

E-Mail Filtering: In Danger of Being Commercialized

The first e-mail filter that's on the right track has been announced. The software product, made by Banter doesn't yet have a name. The idea is that it sorts e-mail in your in box based on a whole mess of criteria on a scale of 1-100. It's the criteria that's the trick, of course, but they claim to have licked that. This is obviously necessary, as the current state of e-mail is quickly rendering it useless, as I'm sure many of you can sympathize with. The problem? This company has a patent on this type of natural language technology, and they're selling this software as part of a $75,000 package. I'm sure when they break it out as its own package, it'll be an enterprise plug-in to Microsoft's mail server. (IIS, I guess it's called. Or maybe that's just HTTP. I don't know.)

I'm worried. I lack both the time and the proper skills to launch an open source counterpart, but I'm gravely concerned that nobody else will do so. It's essential that an RFC be created to describe a public-domain, standards-based approach to a similar system such that everybody can benefit from this mail system. I'd hate for this system to be the defacto standard 24 months down the line, and find OSS advocates once again playing catch-up.


I've been doing a lot of PHP recently. Not only because it's necessary, but as a warm-up for getting involved in PostNuke. I need to get back into the swing of things.
I used ICQ in 1997ish, and I didn't like it, mostly because it meant that Dave Matthews Band fans could interrupt me at any time and beg me for backstage passes, something that I'm not even capable of providing them with. However, I've read so much about Jabber in the past year or so that I just had to download it. I'm running Fire.app on my PowerBook, and I'll run some sort of a Jabber on my Linux box when Asus finally sends me my motherboard. So far I've only done the whole IM thing, but eventually I'll get into the guts of the system. From what I've seen so far, it seems quite elegant.


I've joined the PostNuke team, more or less. I've made an account on the developer site, joined the mailing list, and e-mailed them to notify them that I intend to contribute. I've been reading the mailing list (which doesn't appear to be available as a digest, but Procmail is doing the trick right now), and it'll take me a week or two to find my sea legs, I'm sure.

OSS Advocacy

Today, I visited a large insurance company that shall remain anonymous. Let's just call them Insurance Company. Insurance Company is working with Proprietary Software Developer, who makes policy issuance software. The firm that I work with, a surplus lines brokerage, has offered to beta-test Proprietary Software Developer's new Web-Enabled(tm) version of Proprietary Software. So I spent five hours in Richmond, Virginia (USA) today in a meeting with The Powers That Be at Insurance Company, going over the basic concepts of Proprietary Software to see how it can fit in at my company. There are so many flaws in this system that it borders on silly, but the biggest is that they've more or less locked themselves into this Proprietary Software.

Now, Insurance Company has been around for a long time. So this move to the Web-Enabled system is not something that they take lightly. What would lead them to transition themselves from paper to digital is fairly obvious to all of us, I imagine. However, what would make them think that it's a good idea to become entirely dependent on Proprietary Software Developer is a complete mystery to me. Proprietary Software Developer has been in business for about a decade. They've shown a strong bent towards gauging their customers, with licensing fees upwards of $40,000/year. I put this question to Insurance Company: Who is going to be in business longer, Insurance Company, or Proprietary Software Developer? They felt, of course, that they would be. Which made me wonder why they would make their company entirely dependent on Proprietary Software. They didn't have an answer to this.

After about four hours of discussion, I delivered an impassioned speech on the merits of open source, free software, attempted to convince them to release their XML standard into the public domain, set up an XML-RPC server and allow brokerages to develop the software. This was lost on them in a tremendous number of ways. Why would they give away their valuable (in their eyes) XML standard to their competition? Why should they cooperate with people that they'd like to defeat? Why would brokerages develop software?

Try as I might, I made little more than a dent in their two basic misconceptions. And these are the misconceptions that most BigCos (credit to Dave Winer) have:

  1. Their idea/technology/standard is so incredibly valuable that they can't possibly share it with anybody under any circumstances.
  2. Cooperation with competition is inherently damaging.
If they would take their largely-pathetic innovations and combine them with similar innovations of just a few other insurance companies, then they'd have something close to a standard. A standard would allow software developers of all sizes and shapes to develop software that would work for many insurance companies, brokers and agents and, if the standard continued to grow, perhaps a tremendous portion of the industry as a whole. A rising tide raises all boats, after all. But I guess Insurance Company didn't get that memo.

I'm so eager to get my Asus motherboard back -- I've just sent back the one that they sent me to replace the one that I bought that was broken. (Phew.) I'm desperate to have a system that I can use for Python and PHP development, notably to be able to start making contributions (beyond bug filings and such) to PostNuke. My motives are largely itch-scratching; the bugs on cvillenews.com are really very irritating. But rather than fix them for myself every version, I'd like to get the changes into the source.

Anyhow, just gripin'.

150 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!