Older blog entries for jamesh (starting at number 280)

In Montreal

I’m in Montreal through to the end of next week.  The sub-zero temperatures are quite a change from Perth, where it got up to 39°C on the day I left.

The last time I was here was for Ubuntu Below Zero, so it is interesting seeing the same city covered in snow.

Syndicated 2009-02-24 19:12:16 from James Henstridge

In Hobart

Today was the first day of the mini-conferences that lead up to linux.conf.au later on this week.  I arrived yesterday after an eventful flight from Perth.

I was originally meant to fly out to Melbourne on the red eye leaving on Friday at 11:35pm, but just before I checked in they announced that the flight had been delayed until 4:00am the following day.  As I hadn’t had a chance to check in, I was able to get a pair of taxi vouchers to get home and back.  I only got about 2 hours of sleep though, as they said they would turn off the baggage processing system at 3am.  When I got back to the airport, I could see all the people who had stayed at the terminal spread out with airplane blankets.  A little before the 4:00am deadline, another announcement was made saying the plane would now be leaving at 5:00am.  Apparently they had needed to fly a replacement component in from over east to fix a problem found during maintenance.  Still, it seems it wasn’t the most delayed Qantas flight for that weekend and it did arrive in one piece.

As I had planned to spend a day in Melbourne visiting relatives, it didn’t cause any problems with the flight on to Hobart.  I had been invited to the “Ghosts” dinner, which was to start about an hour after my flight landed, so it was a bit of a rush to get to the university accommodation and then walk down the hill to the restaurant.

The dinner was pretty good, with organisers from all the previous LCA conferences plus the people organising the 2010 conference.  Unfortunately, I was the only one from the 2003 organisers able to attend.  It sounds like the 2010 organisers have things in hand, and the location should be great.

Syndicated 2009-01-19 13:01:21 from James Henstridge

Getting “bzr send” to work with GMail

One of the nice features of Bazaar is the ability to send a bundle of changes to someone via email.  If you use a supported mail client, it will even open the composer with the changes attached.  If your client isn’t supported, then it’ll let you compose a message in your editor and then send it to an SMTP server.

GMail is not a supported mail client, but there are a few work arounds listed on the wiki.  Those really come down to using an alternative mail client (either the editor or Mutt) and sending the mails through the GMail SMTP server.  Neither solution really appealed to me.  There doesn’t seem to be a programatic way of opening up GMail’s compose window and adding an attachment (not too surprising for a web app).

What is possible though is connecting via IMAP and adding messages to the drafts folder (assuming IMAP support is enabled).  So I wrote a small plugin to do just that.  It can be installed with the following command:

bzr branch lp:~jamesh/+junk/bzr-imapclient ~/.bazaar/plugins/imapclient

And then configure the IMAP server, username and mailbox according to the instructions in the README file.  You can then use “bzr send” as normal and then complete and send the draft at your leisure.

One nice thing about the plugin implementation is that it didn’t need any GMail specific features: it should be useful for anyone who has their drafts folder stored on an IMAP server and uses an unsupported mail client.

The main area where this could be improved would be to open up the compose screen in the web browser.  However, this would require knowing the internal message ID for the new message, which I can’t see how to access via IMAP.

Syndicated 2009-01-16 09:19:31 from James Henstridge

Using Twisted Deferred objects with gio

The gio library provides both synchronous and asynchronous interfaces for performing IO.  Unfortunately, the two APIs require quite different programming styles, making it difficult to convert code written to the simpler synchronous API to the asynchronous one.

For C programs this is unavoidable, but for Python we should be able to do better.  And if you’re doing asynchronous event driven code in Python, it makes sense to look at Twisted.  In particular, Twisted’s Deferred objects can be quite helpful.

Deferred

The Twisted documentation describes deferred objects as “a callback which will be put off until later”.  The deferred will eventually be passed the result of some operation, or information about how it failed.

From the consumer side, you can register one or more callbacks that will be run:

def callback(result):
    # do stuff
    return result

deferred.addCallback(callback)

The first callback will be called with the original result, while subsequent callbacks will be passed the return value of the previous callback (this is why the above example returns its argument). If the operation fails, one or more errbacks (error callbacks) will be called:

def errback(failure):
    # do stuff
    return failure

deferred.addErrback(errback)

If the operation associated with the deferred has already been completed (or already failed) when the callback/errback is added, then it will be called immediately. So there is no need to check if the operation is complete before hand.

Using Deferred objects with gio

We can easily use gio’s asynchronous API to implement a new API based on deferred objects.  For example:

import gio
from twisted.internet import defer

def file_read_deferred(file, io_priority=0, cancellable=None):
    d = defer.Deferred()
    def callback(file, async_result):
        try:
            in_stream = file.read_finish(async_result)
        except gio.Error:
            d.errback()
        else:
            d.callback(in_stream)
    file.read_async(callback, io_priority, cancellable)
    return d

def input_stream_read_deferred(in_stream, count, io_priority=0,
                               cancellable=None):
    d = defer.Deferred()
    def callback(in_stream, async_result):
        try:
            bytes = in_stream.read_finish(async_result)
        except gio.Error:
            d.errback()
        else:
            d.callback(bytes)
    # the argument order seems a bit weird here ...
    in_stream.read_async(count, callback, io_priority, cancellable)
    return d

This is a fairly simple transformation, so you might ask what this buys us. We’ve gone from an interface where you pass a callback to the method to one where you pass a callback to the result of the method. The answer is in the tools that Twisted provides for working with deferred objects.

The inlineCallbacks decorator

You’ve probably seen code examples that use Python’s generators to implement simple co-routines. Twisted’s inlineCallbacks decorator basically implements this for generators that yield deferred objects. It uses the enhanced generators feature from Python 2.5 (PEP 342) to pass the deferred result or failure back to the generator. Using it, we can write code like this:

@defer.inlineCallbacks
def print_contents(file, cancellable=None):
    in_stream = yield file_read_deferred(file, cancellable=cancellable)
    bytes = yield input_stream_read_deferred(
        in_stream, 4096, cancellable=cancellable)
    while bytes:
        # Do something with the data.  For this example, just print to stdout.
        sys.stdout.write(bytes)
        bytes = yield input_stream_read_deferred(
            in_stream, 4096, cancellable=cancellable)

Other than the use of the yield keyword, the above code looks quite similar to the equivalent synchronous implementation.  The only thing that would improve matters would be if these were real methods rather than helper functions.

Furthermore, the inlineCallbacks decorator causes the function to return a deferred that will fire when the function body finally completes or fails. This makes it possible to use the function from within other asynchronous code in a similar fashion. And once you’re using deferred results, you can mix in the gio calls with other Twisted asynchronous calls where it makes sense.

Syndicated 2009-01-06 01:18:53 from James Henstridge

Red Bull Air Race

Yesterday, I went to see the finals of the Red Bull Air Race here in Perth.  This was my first time watching the event, since I was over seas at the times it was held the previous two years.

The weather was good, and gave me a good opportunity to play with my camera a bit.  Sometimes you can miss out on the action by trying to take photos, but in this case the camera made it a lot easier to see the planes from the shore.

As the overall winner was decided by points scored over the full series, it wasn’t necessary for the series winner to win the Perth race.  This turned out to be the case, with Hans Arch coming third but winning the series.

Hannes Arch passing through one of the gates in front of the WACA

Hannes Arch passing through one of the gates in front of the WACA

The Perth final ended up being between two English pilots: Nigel Lamb and Paul Bonhomme, with Bonhomme winning the race.

Paul Bonhomme passing through the finishing pylons

Paul Bonhomme passing through the finishing gate

After the race, there was a display by the RAAF Roulettes formation flying team, and a fly over by an F/A-18 Hornet.

F/A-18 Hornet

F/A-18 Hornet

Overall, it was a pretty good day.  I don’t know if I’d have been up for watching the entire two days worth, but the finals were entertaining.  Hopefully I’ll be around for next year’s race.

Syndicated 2008-11-03 07:42:56 from James Henstridge

Re: Continuing to Not Quite Get It at Google…

David: taking a quick look at Google’s documentation, it sure looks like OpenID to me.  The main items of note are:

  1. It documents the use of OpenID 2.0’s directed identity mode.  Yes this is “a departure from the process outlined in OpenID 1.0″, but that could be considered true of all new features found in 2.0.  Google certainly isn’t the first to implement this feature:
    • Yahoo’s OpenID page recommends users enter “yahoo.com” in the identity box on web sites, which will initiate a directed identity authentication request.
    • We’ve been using directed identity with Launchpad to implement single sign on for various Canonical/Ubuntu sites.

    Given that Google account holders identify themselves by email address, users aren’t likely to know a URL to enter, so this kind of makes sense.

  2. The identity URLs returned by the OpenID provider do not directly reveal information about the user, containing a long random string to differentiate between users.  If the relying party wants any user details, they must request them via the standard OpenID Attribute Exchange protocol.
  3. They are performing access control based on the OpenID realm of the relying party.  I can understand doing this in the short term, as it gives them a way to handle a migration should they make an incompatible change during the beta.  If they continue to restrict access after the beta, you might have a valid concern.

It looks like there would be no problem talking to their provider using existing off the shelf OpenID libraries (like the ones from JanRain).

If you have an existing site using OpenID for login, chances are that after registering the realm with Google you’d be able to log in by entering Google’s OP server URL.  At that point, it’d be fairly trivial to add another button to the login page – sites seem pretty happy to plaster provider-specific radio buttons and entry boxes all over the page already …

Syndicated 2008-10-30 08:30:02 from James Henstridge

Streaming Vorbis files from Ubuntu to a PS3

One of the nice features of the PlayStation 3 is the UPNP/DLNA media renderer.  Unfortunately, the set of codecs is pretty limited, which is a problem since most of my music is encoded as Vorbis.  MediaTomb was suggested to me as a server that could transcode the files to a format the PS3 could understand.

Unfortunately, I didn’t have much luck with the version included with Ubuntu 8.10 (Intrepid), and after a bit of investigation it seems that there isn’t a released version of MediaTomb that can send PCM audio to the PS3.  So I put together a package of a subversion snapshot in my PPA which should work on Intrepid.

With the newer package, it was pretty easy to get things working:

  1. Install the mediatomb-daemon package
  2. Edit the /etc/mediatomb/config.xml file and make the following changes:
    • Change the <protocolInfo/> line to set extend=”yes”.
    • In the <extension-mimetype> section, uncomment the line to map “avi” to “video/divx”.  This will get a lot of videos to play without problem.
    • In the <mimetype-upnpclass> section, add a line to map “application/ogg” to “object.item.audioItem.musicTrack”.  This is needed for the vorbis files to be recognised as music.
    • In the <mimetype-contenttype> section add a line to map “audio/L16″ to “pcm”.
    • On the <transcoding> element, change the enabled attribute to “yes”.
    • Add the settings from here to the <transcoding> section.
  3. Edit the /etc/default/mediatomb script and set INTERFACE to the network interface you want to advertise on.
  4. Restart the mediatomb daemon.
  5. Go to the web UI (try opening /var/lib/mediatomb/mediatomb.html in a web browser), and add the directories you want to export.
  6. Test things on the PS3.

Things aren’t perfect though.  As MediaTomb is simply piping the transcoded audio to the PS3, it doesn’t implement seeking on such files, and it seems that the PS3 won’t even let you pause a stream that doesn’t allow seeking.  With a less generalised transcoding backend, it seems like it should be trivial to support seeking in an uncompressed PCM stream though, since the byte offsets can be trivially mapped to sample numbers.

The other problem I found was that none of the recent music I’d ripped showed up.  It seems that they’d been ripped with the .oga file extension rather than .ogg.  This change appears to have been made in bug 543306, but the reasoning seems suspect: the guidelines from Xiph indicate that the files generated by this encoding profile should continue to use the .ogg file extension.

I tried adding some extra mappings to the MediaTomb configuration file to recognise the files without success, but eventually decided to just rename them and fix the encoding profile locally.

A Perfect Media Server

While MediaTomb mostly works for me, it doesn’t do everything I’d like.  A few of the things I’d like out of a media server include:

  1. No need to configure things via a web UI.  In fact, I could do without a web UI all together – something nicely integrated into the desktop would be nice.
  2. No need to set model specific settings in the configuration file.  Ideally it would know how to talk to common players by default.
  3. Supports transcoding and seeking within transcoded files.  Preferably knows what needs transcoding for common players.
  4. Picks up new files in real time.  So something inotify based rather than periodic reindexing.
  5. A virtual folder tree for music based on artist/album metadata. A plain folder tree for other media would be fine.
  6. Cached video thumbnails would be nice too.  The build of MediaTomb in my PPA includes support for thumbnails (needs to be enabled in the config file), but they aren’t cached so are slow to appear.

Perhaps Zeeshan’s media server will be worth trying out at some point.

Syndicated 2008-10-30 02:35:40 from James Henstridge

Thoughts on OAuth

I’ve been playing with OAuth a bit lately. The OAuth specification fulfills a role that some people saw as a failing of OpenID: programmatic access to websites and authenticated web services. The expectation that OpenID would handle these cases seems a bit misguided since the two uses cases are quite different:

  • OpenID is designed on the principle of letting arbitrary OpenID providers talk to arbitrary relying parties and vice versa.
  • OpenID is intentionally vague about how the provider authenticates the user. The only restriction is that the authentication must be able to fit into a web browsing session between the user and provider.

While these are quite useful features for a decentralised user authentication scheme, the requirements for web service authentication are quite different:

  • There is a tighter coupling between the service provider and client. A client designed to talk to a photo sharing service won’t have much luck if you point it at a micro-blogging service.
  • Involving a web browser session in the authentication process for individual web service request is not a workable solution: the client might be designed to run offline for instance.

While the idea of a universal web services client is not achievable, there are areas of commonality between different the services: gaining authorisation from the user and authenticating individual requests. This is the area that OAuth targets.

While it has different applications, it is possible to compare some of the choices made in the protocol:

  1. The secrets for request and access tokens are sent to the client in the clear. So at a minimum, a service provider’s request token URL and access token URL should be served over SSL. OpenID nominally avoids this by using Diffie-Hellman Key Exchange to avoid evesdropping, but ended up needing it to avoid man in the middle attacks. So sending them in the clear is probably a more honest approach.
  2. Actual web service methods can be authenticated over plain HTTP in a fairly secure means using the HMAC-SHA1 or RSA-SHA1 signature methods. Although if you’re using SSL anyway, the PLAINTEXT authentication method is probably not any worse than HMAC-SHA1.
  3. The authentication protocol supports both web applications and desktop applications. Though any security gained through consumer secrets is invalidated for desktop applications, since anyone with a copy of the application will necessarily have access to the secrets. A few other points follow on from this:
    • The RSA-SHA1 signature method is not appropriate for use by desktop applications. The signature is based only on information available in the web service request and the RSA key associated with the consumer, and the private key will need to be distributed as part of the application. So if an attacker discovers an access token (not access token secret), they can authenticate.
    • The other two authentication methods — HMAC-SHA1 and PLAINTEXT — depend on an access token secret. Along with the access token, this is essentially a proxy for the user name and password, so should be protected as such (e.g. via the GNOME keyring).  It still sounds better than storing passwords directly, since the token won’t give access to unrelated sites the user happened to use the same password on, and can be revoked independently of changing the password.
  4. While the OpenID folks found a need for a formal extension mechanism for version 2.0 of that protocol, nothing like that seems to have been added to OAuth.  There are now a number of proposed extensions for OAuth, so it probably would have been a good idea.  Perhaps it isn’t as big a deal, due to tigher coupling of service providers and consumers, but I could imagine it being useful as the two parties evolve over time.

So the standard seems decent enough, and better than trying to design such a system yourself.  Like OpenID, it’ll probably take until the second release of the specification for some of the ambiguities to be taken care of and for wider adoption.

From the Python programmer point of view, things could be better.  The library available from the OAuth site seems quite immature and lacks support for a few aspects of the protocol.  It looks okay for simpler uses, but may be difficult to extend for use in more complicated projects.

Syndicated 2008-10-23 03:46:53 from James Henstridge

Django support landed in Storm

Since my last article on integrating Storm with Django, I’ve merged my changes to Storm’s trunk.  This missed the 0.13 release, so you’ll need to use Bazaar to get the latest trunk or wait for 0.14.

The focus since the last post was to get Storm to cooperate with Django’s built in ORM.  One of the reasons people use Django is the existing components that can be used to build a site.  This ranges from the included user management and administration code to full web shop implementations.  So even if you plan to use Storm for your Django application, your application will most likely use Django’s ORM for some things.

When I last posted about this code, it was possible to use both ORMs in a single app, but they would use separate database connections.  This had a number of disadvantages:

  • The two connections would be running separate transactions in parallel, so changes made by one connection would not be visible to the other connection until after the transaction was complete.  This is a problem when updating records in one table that reference rows that are being updated on the other connection.
  • When you have more than one connection, you introduce a new failure mode where one transaction may successfully commit but the other fail, leaving you with only half the changes being recorded.  This can be fixed by using two phase commit, but that is not supported by either Django or Storm at this point in time.

So it is desirable to have the two ORMs sharing a single connection.  The way I’ve implemented this is as a Django database engine backend that uses the connection for a particular named per-thread store and passes transaction commit or rollback requests through to the global transaction manager.  Configuration is as simple as:

DATABASE_ENGINE = 'storm.django.backend'
DATABASE_NAME = 'store-name'
STORM_STORES = {'store-name': 'database-uri'}

This will work for PostgreSQL or MySQL connections: Django requires some additional set up for SQLite connections that Storm doesn’t do.

Once this is configured, things mostly just work.  As Django and Storm both maintain caches of data retrieved from the database though, accessing the same table with both ORMs could give unpredictable results.  My code doesn’t attempt to solve this problem so it is probably best to access tables with only one ORM or the other.

I suppose the next step here would be to implement something similar to Storm’s Reference class to represent links between objects managed by Storm and objects managed by Django and vice versa.

Syndicated 2008-09-19 06:23:51 from James Henstridge

Transaction Management in Django

In my previous post about Django, I mentioned that I found the transaction handling strategy in Django to be a bit surprising.

Like most object relational mappers, it caches information retrieved from the database, since you don’t want to be constantly issuing SELECT queries for every attribute access. However, it defaults to commiting after saving changes to each object. So a single web request might end up issuing many transactions:

Change object 1 Transaction 1
Change object 2 Transaction 2
Change object 3 Transaction 3
Change object 4 Transaction 4
Change object 5 Transaction 5

Unless no one else is accessing the database, there is a chance that other users could modify objects that the ORM has cached over the transaction boundaries. This also makes it difficult to test your application in any meaningful way, since it is hard to predict what changes will occur at those points. Django does provide a few ways to provide better transactional behaviour.

The @commit_on_success Decorator

The first is a decorator that turns on manual transaction management for the duration of the function and does a commit or rollback when it completes depending on whether an exception was raised. In the above example, if the middle three operations were made inside a @commit_on_success function, it would look something like this:

Change object 1 Transaction 1
Change object 2 Transaction 2
Change object 3
Change object 4
Change object 5 Transaction 3

Note that the decorator is usually used on view functions, so it will usually cover most of the request. That said, there are a number of cases where extra work might be done outside of the function. Some examples include work done in middleware classes and views that call other view functions.

The TransactionMiddleware class

Another alternative is to install the TransactionMiddleware middleware class for the site. This turns on transaction management for the duration of each request, similar to what you’d see with other frameworks giving results something like this:

Change object 1 Transaction 1
Change object 2
Change object 3
Change object 4
Change object 5

Combining @commit_on_success and TransactionMiddleware

At first, it would appear that these two approaches cover pretty much everything you’d want. But there are problems when you combine the two. If we use the @commit_on_success decorator as before and TransactionMiddleware, we get the following set of transactions:

Change object 1 Transaction 1
Change object 2
Change object 3
Change object 4
Change object 5 Transaction 2

The transaction for the @commit_on_success function has extended to cover the operations made before hand. This also means that operations #1 and #5 are now in separate transactions despite the use of TransactionMiddleware. The problem also occurs with nested use of @commit_on_success, as reported in Django bug 2227.

A better behaviour for nested transaction management would be something like this:

  1. On success, do nothing. The changes will be committed by the outside caller.
  2. On failure, do not abort the transaction, but instead mark it as uncommittable. This would have similar semantics to the Zope transaction.doom() function.

It is important that the nested call does not abort the transaction because that would cause a new transaction to be started by subsequent code: that should be left to the code that began the transaction.

The @autocommit decorator

While the above interaction looks like a simple bug, the @autocommit decorator is another matter. It turns autocommit on for the duration of a function call, no matter what the transaction mode for the caller was. If we took the original example and wrapped the middle three operations with @autocommit and used TransactionMiddleware, we’d get 4 transactions: one for the first two operations, then one for each of the remaining operations.

I can’t think of a situation where it would make sense to use, and wonder if it was just added for completeness.

Conclusion

While the nesting bugs remain, my recommendation would be to go for the TransactionMiddleware and avoid use of the decorators (both in your own code and third party components). If you are writing reusable code that requires transactions, it is probably better to assert that django.db.transaction.is_managed() is true so that you get a failure for improperly configured systems while not introducing unwanted transaction boundaries.

For the Storm integration work I’m doing, I’ve set it to use managed transaction mode to avoid most of the unwanted commits, but it still falls prey to the extra commits when using the decorators. So I guess inspecting the code is still necessary. If anyone has other tips, I’d be glad to hear them.

Syndicated 2008-09-01 07:42:39 from James Henstridge

271 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!