Older blog entries for robogato (starting at number 21)

Advogato Status Report

New mod_virgule code is live today on Advogato. See the changelog for the details. No new release yet, though. I'm hoping I'll find time to finish up a couple of additional things before the next release.

The feed aggregator can now handle RSS/ATOM feeds that include the blog content as unescaped XHTML within the feed XML tree instead of as escaped content within a single XML node. This seems like a risky approach since the slightest markup error in the blog's XHTML renders the whole feed invalid and unparsable. Worse, the particular ATOM feed that brought this problem to light, generated by blogger, appears to randomly alternate between the two methods. One post is carried as normal escaped content within the entry node and the next is shoved in as an unescaped tree of XHTML tags. But who am I to argue with blogger? If it exists in the wild and doesn't appear to violate the standards, I'll try to make mod_virgule handle it correctly.

I've added support for the foaf:mbox_sha1sum field in the FOAF files output by mod_virgule. This field is an SHA-1 hash of the user email address. It's used as an identifier by some FOAF applications. There is also a group working on a SpamAssassin plugin and email whitelist database that will use trust metrics and FOAF data collected from community sites like Advogato. The email field in the user profile used to be optional, so if you're an old time Advogato user, check your profile and make sure your email address is included. Actually, everyone ought to make sure their email address is current, just in case you need to use the password reminder some day.

Blog (diary) pages are now template based rather than hard coded HTML generated by mod_virgule. The blog page template includes the new page header.

Barbara Irwin of the Victoria Linux Users Group emailed to let us know they've added Advogato to the Loads of Linux Links (LOLL) directory. The LOLL directly looks like an interesting collection of Linux links. Check it out.

Google turned down Advogato's Summer of Code mentor application. While disappointing, this didn't come as a total shock. There's no official organization behind mod_virgule, it's a very small project, and it still seems to be viewed as dead or dying by a few people. That's okay, maybe next year. In the meantime, I'm going to continue working to bring mod_virgule up to date.

There are several badly needed features that are going to require some major code refactoring and code cleanup. One of the Summer of Code ideas was directly related to this. The existing code base desperately needs improved commenting and documentation. I'd really like to see the comments normalized to Doxygen style and comments added to all the currently uncommented sections of the code. Having better comments and documentation would really help with future refactoring of the code and would also lower the barrier for new developers who need to understand how mod_virgule works. Any volunteers? Adding and rewriting code comments doesn't require extensive programming skill (though you will need to be able read and understand some less than beautiful C code).

There are other SoC mod_virgule ideas that I'd still like to see someone help with. Even without Google funding, it's still good experience and might even be fun. If you think you might be interested in helping out, take a look at the ideas list and let me know.

Advogato Status Report

A new rev of mod_virgule went live yesterday on Advogato. See the changelog for the details.

With all the articles being posted lately, the need to edit an article to correct mistakes and typos resurfaced. The article code is a bit scary and looks way overdue for a complete rewrite. But until then, I've added one more kludge to allow editing. Articles are now editable by the author for a period of 30 days after they're posted. (If you can't fix your typos in 30 days, you probably never will!) Articles that have been edited will include a revision date in the article header.

Otherwise, mostly small changes this time around. The much maligned certification dialog text inherited from robots.net has been toned down to something more minimal. I made a few very minor security enhancements to the new accounts page. A CSS clear:both style was added to the recentlog post headers. This fixes the bug that allowed floated images in a post to overlap the next post. I've migrated a few more pages to the new header style.

I made a few minor tweaks to the profile pages to help control bandwidth wastage and security problems. Untrusted users no longer have RSS feeds or FOAF RDF support on their user profiles. This is to prevent abuse by spammers but will also help cut down on bandwidth slightly. The biggest change is that RSS feeds don't exist until an account has at least one diary entry. This removes about 9,000 RSS feeds that were empty (but still being checked several times an hour by a hundred different aggregators).

I've banned a misbehaving web robot, named VoilaBot, used by a French search engine. Despite retrieving our robots.txt file several thousand times per day, it appears to ignore it. This robot was using gigabits of our bandwidth (up to 10% of the total so far this month). We get no inbound traffic from this search engine in return (which isn't suprising since Advogato isn't a French language site).

I've also banned several other robots that appeared to be harvesting email addresses for spammers. One of these had an agent string only one character different than pipeman's XML-RPC client. A typo on my part blocked him for a few hours. Sorry about that.

Google Summer of Code Mentor Application

I filed a mentor application in Advogato's name for the 2007 Google Summer of Code. If Google accepts it, I'm hoping maybe we can recruit a student or two to help with some of the mod_virgule work.

18 Feb 2007 (updated 18 Feb 2007 at 14:41 UTC) »
Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details. Lots of minor bug fixes and a couple of more interesting changes. Even one hardware note: I've doubled the RAM on the server from 1GB to 2GB.

Long Lost Trust Certifications Restored

You may have noticed some additional inbound or outbound trust certifications on your page or slight changes in your certification level this week thanks to some repairs done to the XML datastore. This would be a good time to go through your certs and make sure you've certified everyone you want to and no one you don't want to.

Over the 8 years Advogato has been online, it has suffered through several semi-catastrophic events including disk failures and power supply failures. There was also a mod_virgule bug triggered under disk-full conditions that truncated many user account profiles a year or so ago. The result of these past catastrophies was the complete loss of a few user profiles and minor corruption of many others. Usually, enough of the profile XML file remained (or could be restored) to allow a user to log in but some or all of the trust metric certifications and other data were lost. For a while the corrupt profiles could cause mod_virgule to segfault during a trust metric update (that bug has been fixed for a while). The most noticeable side-effect is missing or incorrect certs on the profile page.

One of the interesting things about the way the trust certs are stored in the XML database is that each cert is recorded in the profile of both the issuer and subject. This means it's possible to reconstruct a lost cert provided one of the two records still exists. Well, I finally got a chance to write some code to do that. I've written a new mod_virgule function to analyze the user profiles, find these sorts of problems, and repair them when possible. In addition to restoring lost certs, the new code also looks for invalid XML, missing profiles, certs to or from non-existent accounts, and a few other forms of corruption that are known to occur occasionally.

The result?

1115 missing outbound certs records restored
1264 missing inbound certs records restored
17 other misc profile corruption problems fixed

One side effect of all this is that all those missing certs will now be included in the trust metric computations again. So there have probably been a few changes in certification levels.

Consistent Page Headers on the way

One persistent category of Advogato complaints I get is about the inconsistent page layout. Some pages have menus at the top, other pages have the menu at the bottom. Sometimes the menu is centered, sometimes it's right justified. Most pages don't have a logo or even the name of site on them, which makes it confusing if you arrive from a search engine anywhere but the index page. On the other hand I feel like I have to balance the need for an updated, consistent page layout with Advogato's historically minimal design. So I'll try to take things slow and not make any major changes overnight. I've created a standard page header and page layout that should address the consistency issues without drastically altering the appearance of the site.

Over time, I'll try to get the new header on every page so the site begins to look a little more consistent. There are still a few pages with hard-coded HTML generated by mod_virgule. Making these remaing pages template-based will require code changes. One other nice result of finally getting the last few parts of mod_virgule fully template-based is that we should be able to purge the last non-standard HTML and maybe even bring the site up to full XHTML standards compliance.

As part of the page header improvements, I've converted the Advogato logo from GIF to PNG. The new logo has the same dimensions but the filesize is about 20% smaller, saving us a little bandwidth. I've also added a Google Coop AJAX-based search widget to provide a site search function, another frequent request. The new layout can be seen on the people page and a few other pages so far. You may also notice some new stats on the people page - this is another handy use of the new user account analysis code.

Advogato Articles

I was pleased to get all the emails and comments on my GNU/FSF news summary. I'd still like to find a volunteer who's willing to put together a summary like this every month.

I was also very pleased to see other new articles posted by mjg59, fxn, and lkcl. The ACPI article got picked up by linux today and generated more hits than any other article in the last several months. If we could generate a few articles like that every month, we'd be well on the way to making Advogato a more interesting and useful site.

PyCon and Advogato

PyCon is coming to Dallas, where the Advogato site is hosted. Is anyone up for some type of Advogato get-together during the conference (Feb 23-25)? If you'll be in town and want to meet some fellow Advogato users, email me and we'll work out the details.

Advogato's Aggravation

I've been pondering the problem of what do about Advogato's article section on the main page. Aside from the various bugs and feature requests I've been working on, the single most common complaint I've seen about the site is the low quality of the articles. As I mentioned in an earlier post, this problem has been brought up before.

It seems to me that rather than worry too much about how to prevent the occasional bad articles, we should focus on how to encourage useful and interesting articles. The first step is to find a definition of what useful and interesting mean in the context of Advogato.

Obviously, articles about software design, standards, or related topics are always interesting. If you're working on a paper or a talk for an upcoming FOSS conference, consider posting a freely licensed draft as an article to get feedback. The occasional interview, question, insight, or advice from someone in the community can also be interesting. Unfortunately, past experience shows we can't expect many of these types of articles. That still leaves a pretty big gap that will likely be filled by noise if it isn't used for something more interesting.

There are already plenty of sites like Slashdot where one can find vaguely FOSS-related links to news stories. I don't think Advogato should go the route of becoming yet another aggregator of recycled news stories. While that's an easy solution and would probably generate a lot of traffic, it's not why we're here. In one of Raph's early postings about Advogato he said the purpose of the site is "to bring a group of people closer together, not to generate hits.".

Robogato's Revelation

What is it that makes Advogato different from other Free Software/Open Source web communities? Most sites focus on a very particular FOSS sub-community: GNU, Apache, BSD, KDE, Mozilla, RedHat, Debian, FreeDesktop/X.Org, Perl, Python (to name just a few). Often, members of each community aggregate around each other, ignoring or forgetting what's going on in the larger FOSS community. Advogato, on the other hand, has active members from almost all these communities. This is one place where we can read each other's blogs and find out what's going on in other parts of the FOSS community.

When I realized what a unique position Advogato is in, it became obvious to me that one useful and interesting thing we can do is use the articles section to inform each other of what our respective communities have been doing on a weekly or monthly basis. Often the volume of news, blogs, and websites in each sub-community makes it difficult for an outsider to stay up to date.

As an illustration of this, I'm reminded of the LKML. The volume of the list makes it impossible for me to keep up - I simply don't have the time. However, I used to enjoy reading the Kernel Traffic summaries regularly so I'd have some idea of what the Linux developers were up to. Sadly, Kernel Traffic is no more. Likewise, there have been similar efforts to summarize activity in other communities (e.g. Brave GNU World, This Month in BSD, the gcc newsletter, WineHQ news, etc). Most of these are defunct, being replaced by dozens of individual websites, blogs, and mailing lists.

What I propose is recruiting Advogato users from each of the many FOSS communities to write and post a periodic summary of significant events in their respective groups. I'm willing to work with these volunteers to devise a useful format and a system for assembling the reports. This will take some time to get going so I think the best plan is to focus on the communities one by one, working out the system and getting things started, then moving on to the next group. As a start, I've written an example summary of the GNU project's activites this month. I've worked out where to get the information and how to assemble it into a simple format. I'll post it shortly as an article. What I need now is just one volunteer willing to contribute an hour of their time once a month to assemble and post a GNU update. Who's up for the job?

The next question is what FOSS community would you like to see a monthly summary of next? Ruby? Perl? BSD? I need suggestions and volunteers. gato@advogato.org

2 Feb 2007 (updated 3 Feb 2007 at 21:38 UTC) »

Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details.

This rev adds FOAF files to our user profiles, helping to make Advogato part of the Semantic Web. Each account profile page has a visible FOAF link as well an auto-discovery meta link that points to a foaf.rdf file for that account. At present the FOAF files have minimal properties. The FOAF standard allows for some additional features that will probably be added over time. At present, outbound trust certifications are converted to foaf:knows properties. Inbound certs are ignored. Project relations are exported as foaf:currentProject properties. To get an idea of what you can do with FOAF, try using the DISCO Hyperdata Browser to view the FOAF file of an Advogato seed account such as Raph's (see also the FOAFer result for the same file).

In addition to the new FOAF badge, you may have noticed some other very minor changes on the user profile. I've done a little HTML clean up and correction. The old, ugly RSS image has been replaced with the standard feed icon established by the Mozilla Foundation. Combined with our new RSS 2.0 feeds, this almost makes it look like Advogato is a modern website. :-)

Among other minor changes, trust certifications now include a date stamp. This will allow the future addition of date-dependent trust features such as age-based certificate expiration for inactive users.

All of the admin functionality of mod_virgule has been moved to a single base URL where it can be password protected. This includes the diagnostics page and crank pages for diary ratings, trust metrics, and the aggregator. Several of these pages were security risks either by leaking information about the server configuration or by being CPU intensive enough to be useful for DoS attacks.

Certification dialog

cdfrey notes in his blog:

"I just noticed something new in the advogato pages. When looking at a user, you get the following warning:

Note: By certifying a user you are making a public statment that you know this person and can vouch for their identity.

When did this happen?

I must disagree with this sudden pseudo-gpg keysigning level of certification, especially since this warning is now retroactively applied to people's previous certifications, by mere virtue of being tacked on the bottom of the list."

The new text appeared on Oct 1, 2006 when Advogato was migrated to the newer version of mod_virgule. The message is hard-coded in the module that creates the user profile page and was originally added, not for Advogato, but for robots.net some years before.

On robots.net, the users are not all programmers and many don't have previous experience with any sort of trust metrics. As a whole, the user base had begun to view the trust metric system as nothing more than a group-powered method of allowing other users to post on the site. As a result there was a huge amount of cert inflation (even compared to Advogato) with a large percentage of the user base tending toward Master certification. Many users were automatically certifying all new users as Masters, assuming this would allow them to post and therefore improve the community. In reality, it just increased the noise and spam level, of course.

I experimented with a variety of short messages under the cert dialog to impress upon people that by certifying someone, they bore some responsibility for the results. This particular message seemed to have the most dramatic effect and, over time, solved our problem.

I agree it's unnecessary for Advogato since most users here understand to one degree or another what the trust metric is for. I'll take a look at making this page more easily configurable on a site-by-site basis. That will allow us to use different text on Advogato or remove the message altogether.

With regard to the actual meaning, I didn't intend for "know this person" to mean only that you've met them in person, in meatspace. You might also know them in some other online capacity outside of Advogato. You might know them through email, IRC, another website, etc. In some cases, you might even get to know them by reading their blog on Advogato long enough to feel comfortable expressing some trust for them. I assume Raph meant something similar in his original cert instructions when he says to certify "free software developers you know". My understanding of the trust metric is that you're certifying to the community that you trust the subject really is who they claim to be (at least to the extent that they claim to be a member of the free software community).

Advogato Status Report

I'm working on more code improvements but it will probably be next week before anything interesting emerges. In the meantime...

FAQ: I've added the beginnings of an Advogato FAQ to the site to help cut down on the time I spend answering emails. At present, there's no index and the questions are roughly in order of how frequently they're asked. (okay, one or two I just made up - I suppose they're Frequently Imagined Questions!). Have a look and don't hesitate to point errors or new questions that need to be added.

FOAF: Does anyone have any strong opinions on FOAF? Someone requested we add a FOAF file for each profile and this looks like it would be relatively easy to do. I'm not entirely sure I grok what the point of it is though. Does anyone, anywhere actually use FOAF RDF files for anything useful? Would it be a Good Thing if we suddenly add 10,000 people to the FOAF-o-sphere?

Article Quality: One thing that still seems to need fixing on Advogato is the quality of articles posted on our home page. At present every trusted Advogato user has the freedom to post articles. Unfortunately, not every trusted Advogato user has the ability to post relevant, quality articles. Is there a way to enforce quality without taking away everyone's freedom to post? For background see these two previous Advogato discussions on this subject:

Advogato Status Report

If you haven't been following the saga of Netscape and the RSS 0.91 DTD, here's the summary: On Jan 12 the folks at Deviceforge noticed that Netscape had removed the DTD from their website sometime after Jan 1, 2007. After Slashdot picked up on it, enough people complained that even someone at Netscape acknowledged the problem.

Yesterday, we got an official pronouncement from Netscape. They've agreed to restore the DTD but only until July 1, 2007 after which it will be removed again. Why? According to Netscape, your application shouldn't be "relying on the availability of a static document on a third-party Web server" like, say, a DTD. It's not clear what will happen to RSS 0.91 after July 1. Maybe Netscape will transfer their copyright on the DTD to the W3C and the URL will change. Maybe everyone will have to update their RSS software to ignore the DTD. Maybe everyone will stop using RSS 0.91. Who knows.

Why do we care? Because mod_virgule has always generated RSS 0.91 feeds for the articles on the main page and the user blog feeds. Most RSS readers don't bother to check the DTD but many do, and if the DTD is gone, no more Advogato feed. There was already a task on the ToDo list to bump all our feeds to RSS 2.0, so I did that today as it seemed like the easiest way to bypass the whole issue. All Advogato feeds are now RSS 2.0. I also added some of the optional tags that make life easy for aggregators like guid and pubDate.

15 Jan 2007 (updated 15 Jan 2007 at 22:11 UTC) »

Advogato Status Report

A new rev of mod_virgule code went live today. See the changelog for the details. This upgrade required taking Advogato offline for about an hour to modify the XML database.

Until today, mod_virgule has stored timestamps in the XML data store that reflected the server's local time zone. The code then made assumptions about the time zone when rendering articles, posts, or RSS feeds. Prior to 3pm, 1 October, 2006, the server's local time zone was US Pacific time. When Advogato got transferred to our hosting facility, the new server was using the US Central time zone. This created a further complication because of the two hour time shift. Adding the blog aggregator made things worse because 99% of the incoming blog feeds use UTC timestamps.

Having to juggle three time zones on a regular basis was creating a bit of a headache for me. I decided it was time to get things under control before the code got so complicated that only a Time Lord from Gallifrey could understand it. So mod_virgule now uses UTC for everything. The code changes were relatively straightforward but normalizing Advogato's rather large XML data store was another matter. I wrote a Perl program that recursively scanned Advogato's 30,000+ XML files looking for timestamps in several different formats and adjusted them to UTC (which required a different offset depending on whether they were recorded before or after 3pm, 1 Oct, 2006). That's the reason for the brief downtime.

So, anyway, we're back up and everything should be working the same as always aside from being on UTC time rather than Central time. If anyone notices any breakage, let me know.

5 Jan 2007 (updated 7 Jan 2007 at 02:47 UTC) »

Advogato Status Report

The first new rev of mod_virgule code for 2007 went live today. See the changelog for the details. Basically, it's all bug fixes.

The important one is a rewrite of the diary entry storage code. For users whose posts arrive via syndication, the new code will allow local editing and xml-rpc editing without the save wiping out all the extra XML tags that store syndication state info. This bug was causing the occasional duplicate of syndicated posts (and it's why I warned against mixing local blogging with syndicated blogging when we turned on the aggregator).

Update: Hmmmm... okay, there's still at least one other problem with mixed local and syndicated blogging that can lead to duplicated entries. I'll see if I can track it down soon...

Update 2: Fixed what should be the last issue causing problems for mixed posting. It may actually be safe now. Unfortunately, I discovered one more cause of duplicated posts. There's an RSS variant that retroactively alters the post time of an entry each time it's edited, which confuses our simple little aggregator into thinking it's a new post. Working on a fix now. The world would be such a nicer place if everyone used a sane syndication method like Atom...

Update 3: RSS feeds with shifting date stamps should now be handled a little better. At least if the feed in question has unique item identifiers (some do, some don't - you never know what you'll get with RSS).

12 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!