Older blog entries for robogato (starting at number 17)

2 Feb 2007 (updated 3 Feb 2007 at 21:38 UTC) »

Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details.

This rev adds FOAF files to our user profiles, helping to make Advogato part of the Semantic Web. Each account profile page has a visible FOAF link as well an auto-discovery meta link that points to a foaf.rdf file for that account. At present the FOAF files have minimal properties. The FOAF standard allows for some additional features that will probably be added over time. At present, outbound trust certifications are converted to foaf:knows properties. Inbound certs are ignored. Project relations are exported as foaf:currentProject properties. To get an idea of what you can do with FOAF, try using the DISCO Hyperdata Browser to view the FOAF file of an Advogato seed account such as Raph's (see also the FOAFer result for the same file).

In addition to the new FOAF badge, you may have noticed some other very minor changes on the user profile. I've done a little HTML clean up and correction. The old, ugly RSS image has been replaced with the standard feed icon established by the Mozilla Foundation. Combined with our new RSS 2.0 feeds, this almost makes it look like Advogato is a modern website. :-)

Among other minor changes, trust certifications now include a date stamp. This will allow the future addition of date-dependent trust features such as age-based certificate expiration for inactive users.

All of the admin functionality of mod_virgule has been moved to a single base URL where it can be password protected. This includes the diagnostics page and crank pages for diary ratings, trust metrics, and the aggregator. Several of these pages were security risks either by leaking information about the server configuration or by being CPU intensive enough to be useful for DoS attacks.

Certification dialog

cdfrey notes in his blog:

"I just noticed something new in the advogato pages. When looking at a user, you get the following warning:

Note: By certifying a user you are making a public statment that you know this person and can vouch for their identity.

When did this happen?

I must disagree with this sudden pseudo-gpg keysigning level of certification, especially since this warning is now retroactively applied to people's previous certifications, by mere virtue of being tacked on the bottom of the list."

The new text appeared on Oct 1, 2006 when Advogato was migrated to the newer version of mod_virgule. The message is hard-coded in the module that creates the user profile page and was originally added, not for Advogato, but for robots.net some years before.

On robots.net, the users are not all programmers and many don't have previous experience with any sort of trust metrics. As a whole, the user base had begun to view the trust metric system as nothing more than a group-powered method of allowing other users to post on the site. As a result there was a huge amount of cert inflation (even compared to Advogato) with a large percentage of the user base tending toward Master certification. Many users were automatically certifying all new users as Masters, assuming this would allow them to post and therefore improve the community. In reality, it just increased the noise and spam level, of course.

I experimented with a variety of short messages under the cert dialog to impress upon people that by certifying someone, they bore some responsibility for the results. This particular message seemed to have the most dramatic effect and, over time, solved our problem.

I agree it's unnecessary for Advogato since most users here understand to one degree or another what the trust metric is for. I'll take a look at making this page more easily configurable on a site-by-site basis. That will allow us to use different text on Advogato or remove the message altogether.

With regard to the actual meaning, I didn't intend for "know this person" to mean only that you've met them in person, in meatspace. You might also know them in some other online capacity outside of Advogato. You might know them through email, IRC, another website, etc. In some cases, you might even get to know them by reading their blog on Advogato long enough to feel comfortable expressing some trust for them. I assume Raph meant something similar in his original cert instructions when he says to certify "free software developers you know". My understanding of the trust metric is that you're certifying to the community that you trust the subject really is who they claim to be (at least to the extent that they claim to be a member of the free software community).

Advogato Status Report

I'm working on more code improvements but it will probably be next week before anything interesting emerges. In the meantime...

FAQ: I've added the beginnings of an Advogato FAQ to the site to help cut down on the time I spend answering emails. At present, there's no index and the questions are roughly in order of how frequently they're asked. (okay, one or two I just made up - I suppose they're Frequently Imagined Questions!). Have a look and don't hesitate to point errors or new questions that need to be added.

FOAF: Does anyone have any strong opinions on FOAF? Someone requested we add a FOAF file for each profile and this looks like it would be relatively easy to do. I'm not entirely sure I grok what the point of it is though. Does anyone, anywhere actually use FOAF RDF files for anything useful? Would it be a Good Thing if we suddenly add 10,000 people to the FOAF-o-sphere?

Article Quality: One thing that still seems to need fixing on Advogato is the quality of articles posted on our home page. At present every trusted Advogato user has the freedom to post articles. Unfortunately, not every trusted Advogato user has the ability to post relevant, quality articles. Is there a way to enforce quality without taking away everyone's freedom to post? For background see these two previous Advogato discussions on this subject:

Advogato Status Report

If you haven't been following the saga of Netscape and the RSS 0.91 DTD, here's the summary: On Jan 12 the folks at Deviceforge noticed that Netscape had removed the DTD from their website sometime after Jan 1, 2007. After Slashdot picked up on it, enough people complained that even someone at Netscape acknowledged the problem.

Yesterday, we got an official pronouncement from Netscape. They've agreed to restore the DTD but only until July 1, 2007 after which it will be removed again. Why? According to Netscape, your application shouldn't be "relying on the availability of a static document on a third-party Web server" like, say, a DTD. It's not clear what will happen to RSS 0.91 after July 1. Maybe Netscape will transfer their copyright on the DTD to the W3C and the URL will change. Maybe everyone will have to update their RSS software to ignore the DTD. Maybe everyone will stop using RSS 0.91. Who knows.

Why do we care? Because mod_virgule has always generated RSS 0.91 feeds for the articles on the main page and the user blog feeds. Most RSS readers don't bother to check the DTD but many do, and if the DTD is gone, no more Advogato feed. There was already a task on the ToDo list to bump all our feeds to RSS 2.0, so I did that today as it seemed like the easiest way to bypass the whole issue. All Advogato feeds are now RSS 2.0. I also added some of the optional tags that make life easy for aggregators like guid and pubDate.

15 Jan 2007 (updated 15 Jan 2007 at 22:11 UTC) »

Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details. This upgrade required taking Advogato offline for about an hour to modify the XML database.

Until today, mod_virgule has stored timestamps in the XML data store that reflected the server's local time zone. The code then made assumptions about the time zone when rendering articles, posts, or RSS feeds. Prior to 3pm, 1 October, 2006, the server's local time zone was US Pacific time. When Advogato got transferred to our hosting facility, the new server was using the US Central time zone. This created a further complication because of the two hour time shift. Adding the blog aggregator made things worse because 99% of the incoming blog feeds use UTC timestamps.

Having to juggle three time zones on a regular basis was creating a bit of a headache for me. I decided it was time to get things under control before the code got so complicated that only a Time Lord from Gallifrey could understand it. So mod_virgule now uses UTC for everything. The code changes were relatively straightforward but normalizing Advogato's rather large XML data store was another matter. I wrote a Perl program that recursively scanned Advogato's 30,000+ XML files looking for timestamps in several different formats and adjusted them to UTC (which required a different offset depending on whether they were recorded before or after 3pm, 1 Oct, 2006). That's the reason for the brief downtime.

So, anyway, we're back up and everything should be working the same as always aside from being on UTC time rather than Central time. If anyone notices any breakage, let me know.

5 Jan 2007 (updated 7 Jan 2007 at 02:47 UTC) »

Advogato Status Report

The first new rev of mod_virgule code for 2007 went live today. See the changelog for the details. Basically, it's all bug fixes.

The important one is a rewrite of the diary entry storage code. For users whose posts arrive via syndication, the new code will allow local editing and xml-rpc editing without the save wiping out all the extra XML tags that store syndication state info. This bug was causing the occasional duplicate of syndicated posts (and it's why I warned against mixing local blogging with syndicated blogging when we turned on the aggregator).

Update: Hmmmm... okay, there's still at least one other problem with mixed local and syndicated blogging that can lead to duplicated entries. I'll see if I can track it down soon...

Update 2: Fixed what should be the last issue causing problems for mixed posting. It may actually be safe now. Unfortunately, I discovered one more cause of duplicated posts. There's an RSS variant that retroactively alters the post time of an entry each time it's edited, which confuses our simple little aggregator into thinking it's a new post. Working on a fix now. The world would be such a nicer place if everyone used a sane syndication method like Atom...

Update 3: RSS feeds with shifting date stamps should now be handled a little better. At least if the feed in question has unique item identifiers (some do, some don't - you never know what you'll get with RSS).

Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details - but only if you're really bored. There were only very, very minor changes. With the holidays coming up, I'm not sure how much time I'll have to work on the code over the next couple of weeks. So don't expect any spectacular new features.

What would be nice is seeing one shiny new article posted on Advogato before the end of December. If any Advogato users presented at the recent OSDC and have an interesting paper, maybe you could post it here as an article. Just a thought.

Advogato and Greenhouse Gas

I noticed pphaneuf's post about Second life, computer power consumption and the relation to CO2 emissions. I may not have mentioned before that the server Advogato is hosted on now, and our entire little facility, is powered by 100% wind generated power. We recently got our EPA Green Power Partner approval. I've never calculated the electricity used by just Advogato but overall we use about 4,000 kWh per month. According to most estimates I've seen, this translates into 6,000 - 8,000 pounds of CO2 that we avoid putting into the air each month. And we aren't the first. I've seen several other hosting facilities that have gone to 100% non-polluting power providers. Here in Texas, it's actually saving us money too, since the cost of wind tends not to be affected much by the rising cost of gas and coal. So maybe some of the Second Life users should ask about that.

7 Dec 2006 (updated 7 Dec 2006 at 01:37 UTC) »

Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details.

I've added support for a couple of additional RSS variants with ever more unusual date stamp formats. In theory the RSS pubDate tag is suppose to use the date format described in RFC822. The first problem is that RFC822 allows a lot of variation. The second problem is that RFC822 specifies a two digit year. For obvious reasons most RSS feeds use a four digit year. Mod_virgule's first line of defense is to call the Apache APR routine apr_date_parse_rfc(), which will parse all date strings that actually comply with RFC822, plus nine variants that are not strictly RFC822 compliant but are commonly seen in the wild. So far, at least one common blogging app, Blosxom, produces a pubDate field that is not RFC822 compliant and can't be parsed by apr_date_parse_rfc(). I've added a custom strptime() call that handles these. A patch for the Apache APR folks is in the works.

Some RSS feeds don't have a pubDate tag at all. Instead they have a date tag which, instead of RFC822, contains an RFC3339 formatted date string. This is actually much nicer, since it's a slightly more sane format and is the same one used in Atom feeds, so we already have code for handling it.

Speaking of Atom, the mod_virgule aggregator now supports the old, deprecated Atom v0.3 feeds in addition to the current Atom v1.0 standard.

So here's what we support right now:

  • Atom 0.3
  • Atom 1.0
  • RSS 0.91 *(only if optional pubDate or date tags are included)
  • RSS 0.92 *(only if optional pubDate or date tags are included)
  • RSS 2.0
  • RDF Site Summary 0.9 *(untested)
  • RDF Site Summary 1 *(all variants seen so far work)
  • RDF Site Summary 1.1 *(untested)

I wish I could support the RSS 0.91/0.92 feeds that don't have any sort of time or date stamps at all but it would require some reworking of the code in the aggregator that sorts out which posts are new and which have been seen before. In most cases RSS 0.91/0.92 allows the use of both date and pubDate, so if you make sure those tags are included, things should work fine. Otherwise, your best bet is to use something a little more recent like RSS 2.0 or Atom 1.0.

The other update this week was a performance improvement. Each hour the trust metric and blog interest eigen vector ratings are recalculated. The eigen vector recalculation takes several minutes to complete. In the past the process held a read lock on the XML database, preventing any other process from taking a write lock. This caused some operations on Advogato to block (such as clicking on the "Read more..." link of articles, which writes an update to the user's "last read" pointers). This problem is now fixed. The site should seem signficantly less sluggish at the top of the hour when the update runs. The eigen vector processing now releases the read lock and gives up its time slice, then re-acquires the lock on each iteration. The total processing time is slightly longer (from 3 minutes to 3.25 minutes) but during that time the site can be used normally without feeling slow.

Advogato Status Report

A new rev of mod_virgule code went live Wednesday, with some additional fixes going live last night. See the changelog for the details.

This release adds support to the aggregator for blog entry updates via syndicated feeds. As far as I can tell, only Atom supports updates in any obvious way. In theory, it should be possible to detect updates to RSS or RDF Site Summary feeds by doing a diff on the content of the entry in the feed against the local copy or by making some other type of guess but it didn't seem worth the trouble right now (patches accepted, of course). Meanwhile, updates should work fine if you're using Atom. For an example see Zaitcev's blog. The Advogato date stamp and "updated" date stamp reflect the time at which the original post and the update respectively hit Advogato. The date stamps in the syndication link at the bottom of the entry reflect the times claimed in the Atom feed for the original post and update. All times have been converted to server local time (currently CST but I feel a change coming...).

It looks like we've now got 10 ex-Advogatoans who've returned to the recentlog via the syndication feature. Hopefully more will follow as word gets out that it's available.

As professor Farnsworth likes to say, "Good news, everyone". The mod_virgule codebase is now in a Subversion repository. The latest changelog can be found in mod_virgule/trunk/ChangeLog. If you want to submit any patches make them against the code in mod_virgule/trunk. Release versions can be found in mod_virgule/tags. To checkout the latest development code:

svn checkout http://svn.dprg.org/repos/mod_virgule/trunk

Or to get the current release:

svn checkout http://svn.dprg.org/repos/mod_virgule/tags/1.41-20061201

8 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!