Older blog entries for rcaden (starting at number 87)

Book Giveaway: Teach Yourself Java in 24 Hours

My newest book, Sams Teach Yourself Java in 24 Hours, Fifth Edition, recently hit bookstores. The book is a for-absolute-beginners guide to programming Java, and this section from chapter one's Q&A section shows how much license I get from the publisher to have fun with the series:

Q. Do you only answer questions about Java?

A. Not at all. Ask me anything.

Q. Okay, why is Prince mad at the Foo Fighters?

A. Prince is unhappy that the Foo Fighters performed a cover of his song "Darling Nikki" and released it as a B-side single in Australia. He told Entertainment Weekly they should write their own tunes and wouldn't let the band release it in the United States. This became a pretty meaningless distinction as the song became a radio hit around the globe and was played regularly during their concerts.

When Prince performed at Super Bowl XLI a few years later, he covered the Foo Fighters' "Best of You," an artistic decision that surprised the Foo Fighters as much as everybody else.

"It was pretty amazing to have a guy like Prince covering one of our songs," Foo Fighters drummer Taylor Hawkins told MTV, "and actually doing it better than we did."

Although playing someone else's music is an odd way to exercise a grudge, this was a better option for the 5-foot-2 Prince than challenging the band to a fight.

Every chapter ends with one reader question that has bupkiss to do with Java. I used to be the Fort Worth Star-Telegram's Ed Brice, an answer man who fielded random questions, so old habits die hard.

Sams Teach Yourself Java in 24 Hours, Fifth EditionMy book has been fully updated for Java 6 and has new chapters on JAX-WS and game programming. I have 20 copies I'd like to give to people who want to learn Java, and there's still time for me to mail them before Christmas.

If you know someone who wants to learn Java, or you can make a convincing case for why Santa owes you this book after the year 2009 you just endured, please leave a comment here on Workbench or in a Twitter post to rcade. Make sure I have some means of contacting you, so I can get the address of the person getting the book.

I'm planning on mailing these out on Wednesday morning in the pre-Christmas scrum at the post office. I will mail the books directly to the people receiving them and can put your name and address as the sender and wrap them if necessary. No one needs to know I was involved.

Please note that I'm expecting the people who get this free book to teach themselves Java in a single contiguous 24-hour period. For too long, Sams has coddled readers who devote one hour a day to a subject and learn it at their leisure.

Syndicated 2009-12-14 18:10:10 from Workbench

Saving Bandwidth on RSS Feed Details

With the current interest in rssCloud and PubSubHubbub (PuSH), I've been thinking about all the bandwidth that's consumed by the RSS elements that describe the feed. When a client requests an RSS feed 10 times in one day, it gets the basic details of the feed over and over again. When clients request the Workbench feed, they get 1,800 characters containing optional RSS elements that I haven't changed in years, except for the PuSH element I added last month. Workbench has 1,900 feed subscribers, so if they average 10 checks a day, they're consuming 32 megabytes every day on information they know already.

James Holderness directed me to RFC3229+feed, a method to request partial RSS feeds that omit elements that a client has already seen. That's useful and has been adopted by some feed publishers and clients, but as far as I can determine, the approach still sends all of the channel elements that describe the feed itself. I wanted to float an idea here to see if it would be useful:

<rssboard:feedDetails>
  http://ekzemplo.com/feedinfo.rss
</rssboard:feedDetails>

This channel-level RSS element identifies a URL that contains the full details about the feed. The details would be expressed as an RSS feed without any item elements.

An optional ttl attribute could contain the number of days the publisher would like clients to cache the information before checking it again:

<rssboard:feedDetails ttl="30">
  http://ekzemplo.com/feedinfo.rss
</rssboard:feedDetails>

A feed publisher who wished to make use of this could move all channel elements except for title, link, description and atom:link to the detail URL. Title, link and description are required in RSS, and atom:link identifies the feed's URL so it can't be moved.

Syndicated 2009-10-13 13:01:25 from Workbench

PubSubHubbub is a Lot Easier Than It Sounds

I've begun digging into PubSubHubbub, the real-time RSS update protocol created by Brad Fitzpatrick and Brett Slatkin of Google and Martin Atkins of Six Apart. I was under the impression that it's harder for RSS publishers to use than the RSSCloud Interface, but that isn't the case. The specification is simple and precisely written, adopting conventions like RFC 2119 that make a spec considerably easier to understand, and it communicates using basic HTTP requests.

PubSubHubbubI wrote the software that runs the Drudge Retort, so I decided to add Pubbub support to it this morning to see how it works. (I'm tripping over the name "PubSubHubbub" like crazy, both when I write and speak, so I'm giving the protocol a nickname.) Pubbub delegates all the work required for update notification to a server called a hub. Google offers a hub at http://pubsubhubbub.appspot.com/ that's free for use by all feed publishers, so I'm relying on it.

First, I added a link element to the Retort's RSS feed that identifies the feed's update hub:

<atom:link rel="hub" href="http://pubsubhubbub.appspot.com" />

Because this element comes from the Atom namespace, I had to make sure it was declared in the feed's top-level RSS element:

<rss version="2.0"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:atom="http://www.w3.org/2005/Atom"
  xmlns:sitemap="http://www.sitemaps.org/schemas/sitemap/0.9">

The bold portion is the Atom declaration. I already was using an Atom element in the feed, so I didn't need to change this.

When a new story is posted on the Retort, the Pubbub hub must be notified that a change has occured. This is handled by sending a ping to the hub with the URL of one or more feeds that have been updated.

I've written an open source Weblog Pinger library in PHP, so I upgraded it to support these pings. A Pubbub ping employs HTTP requests (REST) instead of XML-RPC, the protocol used by Weblogs.Com and similar services. I wrote a new function, ping_rest(), that can send a ping to any Pubbub server.

By the time I was done, I'd spent an hour on the code and a few hours testing it out. So now when I post a new item on the Retort, Google's Pubbub server sends the full text of the item to all readers that support the protocol. This is faster and simpler than RSSCloud, which tells readers to request the feed again.

To give you an idea of how fast Pubbub can be, when I posted a new story on the Retort, it showed up 20 seconds later on FeedBurner, one of the first RSS services to support the protocol.

Syndicated 2009-09-17 20:01:54 from Workbench

RSSCloud Should Not Be Controlled by One Person

RSS iconI posted a call for comments last night on RSS-Public, the mailing list of the RSS Advisory Board, asking what people think the board should do in response to the ongoing effort to revise the RSSCloud Interface.

The interface has been a part of the RSS specification since the publication of RSS 0.92 in December 2000. It determines how software can use the cloud element in an RSS feed to connect to a server that offers real-time notifications when the feed has been updated. In a nutshell, here's how it works:

  • A user subscribes to an RSS feed that has a cloud element which identifies its cloud server.
  • The user's RSS reader contacts the cloud server, asking to be notified when the feed is updated.
  • When the feed has been updated, the software publishing the feed sends a ping to the cloud server.
  • The cloud server sends a notification to the IP address of all RSS readers that asked for updates.
  • The RSS readers immediately request the feed.

Cloud communications can be sent using XML-RPC, SOAP or REST aside from pings, which are sent using XML-RPC.

Dave Winer recently began an effort to revise RSSCloud, persuading WordPress founder Matt Mullenweg to adopt the still-in-progress proposal on all 7.5 million blogs hosted on WordPress.Com. Winer has made three significant changes to the interface.

First, he changed the fifth parameter of a notification request on the REST interface to a series of named url parameters (url1, url2, and so on upwards), each containing the URL of a feed monitored by the cloud.

Next, he added a new ping format to contact cloud servers using REST.

Finally, he has proposed adding a sixth parameter to the notification request, but only for REST requests. The sixth parameter, called domain, identifies a server that will receive notification updates from the cloud server. It's an alternative to using the IP address for notifications.

Winer, the lead author of several versions of the RSS specification and one of the best-known authorities on syndication, is making these changes unilaterally.

Because RSSCloud has been a part of RSS for nine years, I thought it wise for the board to decide what, if anything, it should do regarding this effort. My personal belief is that it's extremely unwise to give a single developer the authority to revise this interface and author its specification.

Ideally, a group should decide what changes should be made to the next version of RSSCloud. This group could be the RSS Advisory Board, which deliberates in public and has 10 members from across the RSS development community, or it could be an ad-hoc group formed strictly to work on the effort.

As a member of the board for five years, I've had a lot of experience dealing with the consequences of a specification process that is closed to public participation and drafted with imprecise language. It leads to situations like the long-running battle over the enclosure element, which carries podcasting files and other multimedia over RSS. As described in the board's RSS Best Practices Profile, the RSS specification doesn't make clear whether an item can contain more than one enclosure. Developers disagree over what the specification means, so interoperability suffers as some allow more than one enclosure and others don't.

I realize that I'm tilting at windmills to suggest that Winer let the RSS Advisory Board get anywhere near the effort. Jon and Kate have a better chance of getting together. But as developers such as Mullenweg implement RSSCloud, they should insist that the revision process take place in public and involve a group of software developers and feed publishers who have the power to approve or reject each change. The group should write the specification together.

Letting Winer make all the decisions by fiat will just buy years of arguments over what his spec means and why no one should ever be allowed to change it.

Syndicated 2009-09-15 20:47:36 from Workbench

There's a Reason RSSCloud Failed to Catch On

RSS iconWordPress and Dave Winer are working together to bring real-time, Twitter-style updates to RSS feeds using the cloud element and the accompanying RSSCloud Interface. Yesterday, WordPress added RSS cloud support to "all 7.5 million blogs on WordPress.com." Winer's documenting the ongoing work at RSSCloud.org.

Although some tech sites are reporting this as a new initiative, cloud has been around since RSS 0.92 in December 2000. I was getting real-time RSS updates as a Radio UserLand blogger back then, and it was a great feature.

However, there's a reason that UserLand turned off cloud support in its products several years ago and shut down all of its cloud notification servers. The approach has massive scaling and firewall issues.

To explain why, it's worth looking at an example. I publish the Drudge Retort, which has around 16,000 subscribers, including 1,000 who get the feeds using desktop software on their home computers. If I add cloud support and all of my subscribers have cloud-enabled readers, each time I update the Retort, my cloud update server will be sending around 1,050 notifications to computers running RSS readers -- 1,000 to individuals and 50 to web-based readers.

That's just for one update. The Retort updates around 20 times a day, so that requires 21,000 notifications sent using XML-RPC, SOAP or REST.

On Internet servers it's extremely expensive to request data from clients, in terms of CPU time and networking resources. You have to make a connection to the computer, wait for a response and deal with timeouts from servers that are unavailable or blocked by a firewall. Every time I've tried to do something like this, it ends up being a huge server bottleneck and I remove the functionality. Last year, I crashed my servers for several days, and the cause was outgoing network connections that supported trackback. I no longer support trackback.

RSSCloud also requires that all desktop software receiving cloud notifications functions as a web server. So if an RSS reader like BottomFeeder or FeedDemon adds cloud support, it must show its users how to turn off firewall ports to accept these incoming requests and possibly turn them off in their router as well. UserLand's attempt to put web servers on user desktops failed because it was too cumbersome to support. Back when I was writing the book Radio UserLand Kick Start and working closely with UserLand developers, their biggest customer service issue was helping users open up their firewalls so that Radio UserLand could act as a web server.

I don't mean to be a dark cloud, because this functionality could be a nice improvement for web-based RSS readers, letting services like Google Reader and Bloglines receive much quicker updates than they get from hourly polling.

But if the effort to make RSS real time extends to desktop software and mobile clients, cloud won't work. I think that RSS update notification would require peer-to-peer technology and something like XMPP, the protocol that powers Jabber instant messaging.

Syndicated 2009-09-08 16:25:42 from Workbench

Sharing Blog Posts on Your Facebook Profile

Facebook application Simplaris BlogcastOver the past few months, I've gotten back into contact with more than a dozen old friends and coworkers through Facebook. After blogging for nine years, I prefer hanging out here on Workbench over social networking sites, but I'm beginning to feel like an anachronism. It's easier for people to keep up with their BFFs on sites like Facebook than to visit a bunch of personal blogs, even with the help of RSS and a feed reader. I recently began linking my posts on Facebook using Simplaris Blogcast, a Facebook application that posts the title and link of blog posts to your Facebook profile. You can manually post items from your blog, pull them automatically from an RSS feed or ping Simplaris with each new post.

For reasons unknown, Simplaris Blogcast stopped pulling items automatically from my feed a month ago. To get automatic posts working again, I've updated my weblog ping library for PHP so that it can ping Blogcast each time I post on Workbench.

Blogcast uses the same ping protocol as Weblogs.Com. Before you can use the Weblog-Pinger library in a PHP script, you must add Blogcast to your Facebook account and retrieve your ping info, which includes a ping URL that includes a special ID unique to your account. In the example URL http://blogcast.simplaris.com/ping/0dd8dfad5c842b600091ba/, the ID is 0dd8dfad5c842b600091ba. You'll need this ID when sending a ping, as in this example code:

require_once('weblog_pinger.php');
$pinger = new Weblog_Pinger();
$pinger->ping_simplaris_blogcast($post_title, $post_link, "0dd8dfad5c842b600091ba");

Once Blogcast has successfully received a ping, the application setting Update Mode will have the Ping Automatic selection chosen.

The code's available under the open source GPL license. If it worked, this post will show up on my Facebook profile.

Syndicated 2009-02-15 18:14:13 from Workbench

Obama's White House Adopts Atom Format

I became the first subscriber on Bloglines to the feed for the new White House web site, which launched at 12:00 p.m. as Barack Obama became the 44th president of the United States. As a syndication dork, I was interested to discover that the feed employs Atom as its format:

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title>White House.gov Blog Feed</title>
  <link href="http://www.whitehouse.gov" />
  <updated>2009-01-20T12:05:25Z</updated>
  <author><name>EOP</name></author>
  <id>urn:uuid:ca4baafc-b6bc-45e5-9144-79c5289d9518</id>
  <entry>
    <title>A National Day of Renewal and Reconciliation</title>
    <link href="http://www.whitehouse.gov/blog/a_national_day_of_renewal_and_reconciliation/" />
    <id>urn:uuid:ca4baafc-b6bc-45e5-9144-79c5289d9518</id>
    <updated>2009-01-20T17:01:00Z</updated>
    <summary>President Barack Obama's first proclamation.</summary>
  </entry>
</feed>

The Atom feed passes the Feed Validator, but there are four issues that trigger warning messages:

  • Your feed appears to be encoded as "utf-8", but your server is reporting "US-ASCII" [help]
  • Missing atom:link with rel="self" [help]
  • Two entries with the same id: urn:uuid:ca4baafc-b6bc-45e5-9144-79c5289d9518 (4 occurrences) [help]
  • Two entries with the same value for atom:updated: 2009-01-20T17:01:00Z [help]

When he has the time, President Obama can address these issues pretty quickly.

First, the XML element should reflect the actual encoding transmitted by the White House server:

<?xml version="1.0" encoding="US-ASCII"?>

Alternatively, the feed should be published using the UTF-8 encoding.

Next, the feed's link element must include an rel="self" attribute indicating that it's the feed's own URL:

<link rel="self" href="http://www.whitehouse.gov/feed/blog/" />

Finally, steps should be taken so that each feed entry has a unique ID. I recommend using the tag URI format, which for the White House could produce id elements like this:

<id>tag:whitehouse.gov,2009:1</id>

The final number in the id element should be a unique number, such as the index number of a blog entry.

The new White House site promises more feeds to come, but describes them as RSS feeds:

RSS is an acronym for Really Simple Syndication or Rich Site Summary. It is an XML-based method for distributing the latest news and information from a website that can be easily read by a variety of news readers or aggregators.

Either this is an error -- Atom feeds are not in RSS format, of course -- or Obama's effort towards national reconciliation includes the combatants in the RSS/Atom war.

Syndicated 2009-01-20 19:23:48 from Workbench

Creating PHP Web Sites with Smarty

I recently relaunched SportsFilter using the site's original web design on top of new programming, replacing a ColdFusion site with one written in PHP. The project turned out to be the most difficult web application I've ever worked on. For months, I kept writing PHP code only to throw it all out and start over as it became a ginormous pile of spaghetti.

Back in July, SportsFilter began crashing frequently and neither I nor the hosting service were able to find the cause. I've never been an expert in ColdFusion, Microsoft IIS or Microsoft SQL Server, the platform we chose in 2002 when SportsFilter's founders paid Matt Haughey to develop a sports community weblog inspired by MetaFilter. Haughey puts a phenomenal amount of effort into the user interface of his sites, and web designer Kirk Franklin made a lot of improvements over the years to SportsFilter. Users liked the way the site worked and didn't want to lose that interface. After I cobbled together a site using the same code as the Drudge Retort, SportsFilter's longtime users kept grasping for a delicate way to tell me that my design sucked big rocks.

PHP's a handy language for simple web programming, but when you get into more complex projects or work in a team, it can be difficult to create something that's easy to maintain. The ability to embed PHP code in web pages also makes it hard to hand off pages to web designers who are not programmers.

I thought about switching to Ruby on Rails and bought some books towards that end, but I didn't want to watch SportsFilter regulars drift away while I spent a couple months learning a new programming language and web framework.

During the Festivus holidays, after the family gathered around a pole and aired our grievances, I found a way to recode SportsFilter while retaining the existing design. The Smarty template engine makes it much easier to create a PHP web site that enables programmers and web designers to work together without messing up each other's work.

Smarty works by letting web designers create templates for web pages that contain three things: HTML markup, functions that control how information is displayed, and simple foreach and if-else commands written in Smarty's template language instead of PHP. Here's the template that display SportsFilter's RSS feed:

<?xml version="1.0" encoding="ISO-8859-1"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
  <channel>
    <title>SportsFilter</title>
    <link>http://www.sportsfilter.com/</link>
    <description>Sports community weblog with {$member_count} members.</description>
    <docs>http://www.rssboard.org/rss-specification</docs>
    <atom:link rel="self" href="http://feeds.sportsfilter.com/sportsfilter" type="application/rss+xml" />
{foreach from=$entries item=entry}
    <item>
      <title>{$entry.title|escape:'html'}</title>
      <link>{$entry.permalink}</link>
      <description>{$entry.description|escape:'html'}</description>
      <pubDate>{$entry.timestamp|date_format:"%a, %d %b %Y %H:%M:%S %z"}</pubDate>
      <dc:creator>{$entry.author}</dc:creator>
      <comments>{$entry.permalink}#discuss</comments>
      <guid isPermaLink="false">tag:sportsfilter.com,2002:weblog.{$entry.dex}</guid>
      <category>{$entry.category}</category>
    </item>
{/foreach}
  </channel>
</rss>

The Smarty code in this template is placed within "{" and "}" brackets. The foreach loop pulls rows of weblog entries from the $entries array, storing each one in an $entry array. Elements of the array are displayed when you reference them in the template -- for example, $entry.author displays the username of the entry's author.

The display of variables can be modified by functions that use the "|" pipe operator. The escape function, used in {$entry.title|escape:'html'}, formats characters to properly encode them for use in an XML format such as RSS. (It's actually formatting them as HTML, but that works for this purpose.)

Because Smarty was developed with web applications in mind, there are a lot of built-in functions that make the task easier. SportsFilter displays dates in a lot of different forms. In my old code, I stored each form of a date in a different variable. Here, I just store a date once as a Unix timestamp value and call Smarty's date_format function to determine how it is displayed.

Smarty makes all session variables, cookies, and the request variables from form submissions available to templates. In SportsFilter, usernames are in $smarty.session.username and submitted comments are in $smarty.request.comment. There also are a few standard variables such as $smarty.now, the current time.

To use Smarty templates, you write a PHP script that stores the variables used by the template and then display the template. Here's the script that displays the RSS feed:

// load libraries
require_once('sportsfilter.php');
$spofi = new SportsFilter();

// load data
$entries = $spofi->get_recent_entries("", 15, "sports,");
$member_count = floor($spofi->get_member_count() / 1000) * 1000;

// make data available to templates
$smarty->assign('spofi', $spofi);
$smarty->assign('entries', $entries);
$smarty->assign('page_title', "SportsFilter");
$smarty->assign('member_count', $member_count);

// display output
header("Content-Type: text/xml; charset=ISO-8859-1");
$smarty->display('rss-source.tpl');

Smarty compiles web page templates into PHP code, so if something doesn't work like you expected, you can look under the hood. There's a lot more I could say about Smarty, but I'm starting to confuse myself.

There are two major chores involved in creating a web application in PHP: displaying content on web pages and reading or writing that content from a database. Smarty makes one of them considerably easier and more fun to program. I'm fighting the urge to rewrite every site I've ever created in PHP to use it. That would probably be overkill.

Syndicated 2009-01-14 19:39:18 from Workbench

Peace Declared Between Myself and Sweden

As it turns out, Sweden did not intentionally declare war on my web server earlier this month. Programmer Daniel Stenberg explains how the international incident happened:

A few years ago I wrote up silly little perl script (let's call it script.pl) that would fetch a page from a site that returns a "random URL off the internet." I needed a range of URLs for a test program of mine and just making up a thousand or so URLs is tricky. Thus I wrote this script that I would run and allow to get a range of URLs on each invoke and then run it again later and append to the log file. It wasn't a fancy script, but it solved my task.

The script was part of a project I got funded to work on, that was improving libcurl back in 2005/2006 so I thought adding and committing the script to CVS felt only natural and served a good purpose. To allow others to repeat what I did.

His script ended up on a publicly accessible web site that was misconfigured to execute the Perl script instead of displaying the code. So each time a web crawler requested the script, it ran again, making 2.6 million requests on URouLette in two days before it was shut down.

Sternberg's the lead developer of CURL and libcurl, open source software for downloading web documents that I've used for years in my own programming. I think it's cool to have helped the project in a serendipitous, though admittedly server destroying, way.

To make it easier for programmers to scarf up URouLette links without international strife, I've added an RSS feed that contains 1,000 random links, generated once every 10 minutes. There are some character encoding issues with the feed, which I need to address the next time I revise the code that builds URouLette's database.

This does not change how I feel about Bjorn Borg.

Syndicated 2008-12-30 16:38:51 from Workbench

Using Treemaps to Visualize Complex Information

I spent some time today digging into treemaps, a way to represent information visually as a series of nested rectangles whose colors are determined by an additional measurement. If that explanation sounds hopelessly obtuse, take a look at a world population treemap created using Honeycomb, enterprise treemapping software developed by the Hive Group:

World population treemap screenshot created by Honeycomb, the Hive Group's treemapping software

This section of the treemap shows the countries of Africa. The size of each rectangle shows its population relative to the other countries. The color indicates population density, ranging from dark green (most dense) to yellow (average) to dark orange (least dense). Hovering over a rectangle displays more information about it,.

A treemap can be adjusted to make the size and color represent different things, such as geographic area instead of population. You also can zoom in to a section of the map, focusing on a specific continent instead of the entire world. The Honeycomb treemapping software offers additional customization, which comes in handy on a Digg treemap that displays the most popular links on the site organized by section.

By tweaking the Digg treemap, you can see the hottest stories based on the number of Diggs, number of Diggs per minute and number of comments. You also can filter out results by number of Diggs, number of Diggs per minute or the age of the links.

I don't know how hard it is to feed a treemap with data, but it seems like an idea that would be useful across many different types of information. As a web publisher, I'd like to see a treemap that compares the web traffic and RSS readership my sites receive with the ad revenue they generate. The Hive Group also offers sample applications that apply treemaps to the NewsIsFree news aggregator, Amazon.Com products, and iTunes singles. This was not a good day to be a Jonas Brother.

Syndicated 2008-12-23 22:48:54 from Workbench

78 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!