Older blog entries for jamesh (starting at number 273)

Thoughts on OAuth

I’ve been playing with OAuth a bit lately. The OAuth specification fulfills a role that some people saw as a failing of OpenID: programmatic access to websites and authenticated web services. The expectation that OpenID would handle these cases seems a bit misguided since the two uses cases are quite different:

  • OpenID is designed on the principle of letting arbitrary OpenID providers talk to arbitrary relying parties and vice versa.
  • OpenID is intentionally vague about how the provider authenticates the user. The only restriction is that the authentication must be able to fit into a web browsing session between the user and provider.

While these are quite useful features for a decentralised user authentication scheme, the requirements for web service authentication are quite different:

  • There is a tighter coupling between the service provider and client. A client designed to talk to a photo sharing service won’t have much luck if you point it at a micro-blogging service.
  • Involving a web browser session in the authentication process for individual web service request is not a workable solution: the client might be designed to run offline for instance.

While the idea of a universal web services client is not achievable, there are areas of commonality between different the services: gaining authorisation from the user and authenticating individual requests. This is the area that OAuth targets.

While it has different applications, it is possible to compare some of the choices made in the protocol:

  1. The secrets for request and access tokens are sent to the client in the clear. So at a minimum, a service provider’s request token URL and access token URL should be served over SSL. OpenID nominally avoids this by using Diffie-Hellman Key Exchange to avoid evesdropping, but ended up needing it to avoid man in the middle attacks. So sending them in the clear is probably a more honest approach.
  2. Actual web service methods can be authenticated over plain HTTP in a fairly secure means using the HMAC-SHA1 or RSA-SHA1 signature methods. Although if you’re using SSL anyway, the PLAINTEXT authentication method is probably not any worse than HMAC-SHA1.
  3. The authentication protocol supports both web applications and desktop applications. Though any security gained through consumer secrets is invalidated for desktop applications, since anyone with a copy of the application will necessarily have access to the secrets. A few other points follow on from this:
    • The RSA-SHA1 signature method is not appropriate for use by desktop applications. The signature is based only on information available in the web service request and the RSA key associated with the consumer, and the private key will need to be distributed as part of the application. So if an attacker discovers an access token (not access token secret), they can authenticate.
    • The other two authentication methods — HMAC-SHA1 and PLAINTEXT — depend on an access token secret. Along with the access token, this is essentially a proxy for the user name and password, so should be protected as such (e.g. via the GNOME keyring).  It still sounds better than storing passwords directly, since the token won’t give access to unrelated sites the user happened to use the same password on, and can be revoked independently of changing the password.
  4. While the OpenID folks found a need for a formal extension mechanism for version 2.0 of that protocol, nothing like that seems to have been added to OAuth.  There are now a number of proposed extensions for OAuth, so it probably would have been a good idea.  Perhaps it isn’t as big a deal, due to tigher coupling of service providers and consumers, but I could imagine it being useful as the two parties evolve over time.

So the standard seems decent enough, and better than trying to design such a system yourself.  Like OpenID, it’ll probably take until the second release of the specification for some of the ambiguities to be taken care of and for wider adoption.

From the Python programmer point of view, things could be better.  The library available from the OAuth site seems quite immature and lacks support for a few aspects of the protocol.  It looks okay for simpler uses, but may be difficult to extend for use in more complicated projects.

Syndicated 2008-10-23 03:46:53 from James Henstridge

Django support landed in Storm

Since my last article on integrating Storm with Django, I’ve merged my changes to Storm’s trunk.  This missed the 0.13 release, so you’ll need to use Bazaar to get the latest trunk or wait for 0.14.

The focus since the last post was to get Storm to cooperate with Django’s built in ORM.  One of the reasons people use Django is the existing components that can be used to build a site.  This ranges from the included user management and administration code to full web shop implementations.  So even if you plan to use Storm for your Django application, your application will most likely use Django’s ORM for some things.

When I last posted about this code, it was possible to use both ORMs in a single app, but they would use separate database connections.  This had a number of disadvantages:

  • The two connections would be running separate transactions in parallel, so changes made by one connection would not be visible to the other connection until after the transaction was complete.  This is a problem when updating records in one table that reference rows that are being updated on the other connection.
  • When you have more than one connection, you introduce a new failure mode where one transaction may successfully commit but the other fail, leaving you with only half the changes being recorded.  This can be fixed by using two phase commit, but that is not supported by either Django or Storm at this point in time.

So it is desirable to have the two ORMs sharing a single connection.  The way I’ve implemented this is as a Django database engine backend that uses the connection for a particular named per-thread store and passes transaction commit or rollback requests through to the global transaction manager.  Configuration is as simple as:

DATABASE_ENGINE = 'storm.django.backend'
DATABASE_NAME = 'store-name'
STORM_STORES = {'store-name': 'database-uri'}

This will work for PostgreSQL or MySQL connections: Django requires some additional set up for SQLite connections that Storm doesn’t do.

Once this is configured, things mostly just work.  As Django and Storm both maintain caches of data retrieved from the database though, accessing the same table with both ORMs could give unpredictable results.  My code doesn’t attempt to solve this problem so it is probably best to access tables with only one ORM or the other.

I suppose the next step here would be to implement something similar to Storm’s Reference class to represent links between objects managed by Storm and objects managed by Django and vice versa.

Syndicated 2008-09-19 06:23:51 from James Henstridge

Transaction Management in Django

In my previous post about Django, I mentioned that I found the transaction handling strategy in Django to be a bit surprising.

Like most object relational mappers, it caches information retrieved from the database, since you don’t want to be constantly issuing SELECT queries for every attribute access. However, it defaults to commiting after saving changes to each object. So a single web request might end up issuing many transactions:

Change object 1 Transaction 1
Change object 2 Transaction 2
Change object 3 Transaction 3
Change object 4 Transaction 4
Change object 5 Transaction 5

Unless no one else is accessing the database, there is a chance that other users could modify objects that the ORM has cached over the transaction boundaries. This also makes it difficult to test your application in any meaningful way, since it is hard to predict what changes will occur at those points. Django does provide a few ways to provide better transactional behaviour.

The @commit_on_success Decorator

The first is a decorator that turns on manual transaction management for the duration of the function and does a commit or rollback when it completes depending on whether an exception was raised. In the above example, if the middle three operations were made inside a @commit_on_success function, it would look something like this:

Change object 1 Transaction 1
Change object 2 Transaction 2
Change object 3
Change object 4
Change object 5 Transaction 3

Note that the decorator is usually used on view functions, so it will usually cover most of the request. That said, there are a number of cases where extra work might be done outside of the function. Some examples include work done in middleware classes and views that call other view functions.

The TransactionMiddleware class

Another alternative is to install the TransactionMiddleware middleware class for the site. This turns on transaction management for the duration of each request, similar to what you’d see with other frameworks giving results something like this:

Change object 1 Transaction 1
Change object 2
Change object 3
Change object 4
Change object 5

Combining @commit_on_success and TransactionMiddleware

At first, it would appear that these two approaches cover pretty much everything you’d want. But there are problems when you combine the two. If we use the @commit_on_success decorator as before and TransactionMiddleware, we get the following set of transactions:

Change object 1 Transaction 1
Change object 2
Change object 3
Change object 4
Change object 5 Transaction 2

The transaction for the @commit_on_success function has extended to cover the operations made before hand. This also means that operations #1 and #5 are now in separate transactions despite the use of TransactionMiddleware. The problem also occurs with nested use of @commit_on_success, as reported in Django bug 2227.

A better behaviour for nested transaction management would be something like this:

  1. On success, do nothing. The changes will be committed by the outside caller.
  2. On failure, do not abort the transaction, but instead mark it as uncommittable. This would have similar semantics to the Zope transaction.doom() function.

It is important that the nested call does not abort the transaction because that would cause a new transaction to be started by subsequent code: that should be left to the code that began the transaction.

The @autocommit decorator

While the above interaction looks like a simple bug, the @autocommit decorator is another matter. It turns autocommit on for the duration of a function call, no matter what the transaction mode for the caller was. If we took the original example and wrapped the middle three operations with @autocommit and used TransactionMiddleware, we’d get 4 transactions: one for the first two operations, then one for each of the remaining operations.

I can’t think of a situation where it would make sense to use, and wonder if it was just added for completeness.

Conclusion

While the nesting bugs remain, my recommendation would be to go for the TransactionMiddleware and avoid use of the decorators (both in your own code and third party components). If you are writing reusable code that requires transactions, it is probably better to assert that django.db.transaction.is_managed() is true so that you get a failure for improperly configured systems while not introducing unwanted transaction boundaries.

For the Storm integration work I’m doing, I’ve set it to use managed transaction mode to avoid most of the unwanted commits, but it still falls prey to the extra commits when using the decorators. So I guess inspecting the code is still necessary. If anyone has other tips, I’d be glad to hear them.

Syndicated 2008-09-01 07:42:39 from James Henstridge

Storm 0.13

Yesterday, Thomas rolled the 0.13 release of Storm, which can be downloaded from Launchpad.  Storm is the object relational mapper for Python used by Launchpad and Landscape, so it is capable of supporting quite large scale applications.  It is seven months since the last release, so there is a lot of improvements.  Here are a few simple statistics:

0.12 0.13 Change
Tarball size (KB) 117 155 38
Mainline revisions 213 262 49
Revisions in ancestry 552 875 323

So it is a fairly significant update by any of these metrics.  Among the new features are:

  • Infrastructure for tracing the SQL statements issued by Storm.  Sample tracer implementations are provided to implement bounded statement run times and for logging statements (both features used for QA of Launchpad).
  • A validation framework.  The property constructors take a validator keyword argument, which should be a function taking arguments (object, attr_name, value) and return the value to set.  If the function raises an exception, it can prevent a value from being set.  By returning something different to its third argument it can transform values.
  • The find() and ResultSet API has been extended to make it possible to generate queries that use GROUP BY and HAVING.  The primary use case for result sets that contain an object plus some aggregates associated with that object.
  • Some core parts of Storm have been accelerated through a C extension.  This code is turned off by default, but can be enabled by defining the STORM_CEXTENSIONS environment variable to 1.  While it is disabled by default, it is pretty stable.  Barring any serious problems reported over the next release cycle, I’d expect it to be enabled by default for the next release.
  • The minimum dependencies of the storm.zope.zstorm module have been reduced to just the zope.interface and transaction modules.  This makes it easier to use the per-thread store management code and global transaction management outside of Zope apps (e.g. for integrating with Django).

It doesn’t include my Django integration code though, since that isn’t fully baked.  I’ll post some more about that later.

Syndicated 2008-08-29 08:21:20 from James Henstridge

Double the Fist

For anyone that cares, the new series of Double the Fist is starting tonight at 9:30pm on ABC2 (and repeated tomorrow on ABC1 for those who don’t get ABC2).  It has been a long time coming (4 years since the previous series), so will hopefully be worth it.  I guess it will be available on the internet shortly after for those outside of Australia.

It is also good to see Roy and HG covering the Olympics again, even if it is only on the radio this time rather than television.  The shows are being posted on the website after airing.

Syndicated 2008-08-14 08:32:19 from James Henstridge

In Orlando

I’ve just finished the first day of the Ubuntu online services sprint in Orlando, Florida.  I didn’t repeat last year’s trick of falling asleep at the airport, so the trip was only about 29 hours all up.

We’ve got a great team that I am looking forward to working with, so it’ll be interesting to see what we do over the next little while.

Syndicated 2008-08-05 01:42:14 from James Henstridge

Using Storm with Django

I’ve been playing around with Django a bit for work recently, which has been interesting to see what choices they’ve made differently to Zope 3.  There were a few things that surprised me:

  • The ORM and database layer defaults to autocommit mode rather than using transactions.  This seems like an odd choice given that all the major free databases support transactions these days.  While autocommit might work fine when a web application is under light use, it is a recipe for problems at higher loads.  By using transactions that last for the duration of the request, the testing you do is more likely to help with the high load situations.
  • While there is a middleware class to enable request-duration transactions, it only covers the database connection.  There is no global transaction manager to coordinate multiple DB connections or other resources.
  • The ORM appears to only support a single connection for a request.  While this is the most common case and should be easy to code with, allowing an application to expand past this limit seems prudent.
  • The tutorial promotes schema generation from Python models, which I feel is the wrong choice for any application that is likely to evolve over time (i.e. pretty much every application).  I’ve written about this previously and believe that migration based schema management is a more workable solution.
  • It poorly reinvents thread local storage in a few places.  This isn’t too surprising for things that existed prior to Python 2.4, and probably isn’t a problem for its default mode of operation.

Other than these things I’ve noticed so far, it looks like a nice framework.

Integrating Storm

I’ve been doing a bit of work to make it easy to use Storm with Django.  I posted some initial details on the mailing list.  The initial code has been published on Launchpad but is not yet ready to merge. Some of the main details include:

  • A middleware class that integrates the Zope global transaction manager.  There doesn’t appear to be any equivalent functionality in Django, and this made it possible to reuse the existing integration code (an approach that has been taken to use Storm with Pylons).  It will also make it easier to take advantage of other future improvements (e.g. only committing stores that are used in a transaction, two phase commit).
  • Stores can be configured through the application’s Django settings file, and are managed as long lived per-thread connections.
  • A simple get_store(name) function is provided for accessing per-thread stores within view code.

What this doesn’t do yet is provide much integration with existing Django functionality (e.g. django.contrib.admin).  I plan to try and get some of these bits working in the near future.

Syndicated 2008-08-01 09:23:16 from James Henstridge

Metrics for success of a DVCS

One thing that has been mentioned in the GNOME DVCS debate was that it is as easy to do “git diff” as it is to do “svn diff” so the learning curve issue is moot.  I’d have to disagree here.

Traditional Centralised Version Control

With traditional version control systems  (e.g. CVS and Subversion) as used by Free Software projects like GNOME, there are effectively two classes of users that I will refer to as “committers” and “patch contributors”:

Centralised VCS Users

Patch contributors are limited to read only access to the version control system.  They can check out a working copy to make changes, and then produce a patch with the “diff” command to submit to a bug tracker or send to a mailing list.  This is where new contributors start, so it is important that it be easy to get started in this mode.

Once a contributor is trusted enough, they may be given write access to the repository moving them to the committers group. They now have access to more functionality from the VCS, including the ability to checkpoint changes into focused commits, possibly on branches.  The contributor may still be required to go through patch review before committing, or may be given free reign to commit changes as they see fit.

Some problems with this arrangement include:

  • New developers are given a very limited set of tools to do their work.
  • If a developer goes to the trouble of learning the advanced features of the version control system, they are still limited to the read only subset if they decide to start contributing to another project.

Distributed Workflow

A DVCS allows anyone to commit to their own branches and provides the full feature set to all users.  This splits the “committers” class into two classes:

Distributed VCS Users

The social aspect of the “committers” group now becomes the group of people who can commit to the main line of the project – the core developers. Outside this group, we have people who make use of the same features of the VCS as the core developers but do not have write access to the main line: their changes must be reviewed and merged by a core developer.

I’ve left the “patch contributor” class in the above diagram because not all contributors will bother learning the details of the VCS.  For projects I’ve worked on that used a DVCS, I’ve still seen people send simple patches (either from the “xxx diff” command, or as diffs against a tarball release) and I don’t think that is likely to change.

Measuring Success

Making the lives of core developers better is often brought up as a reason to switch to a DVCS (e.g. through features like offline commits, local cache of history, etc).  I’d argue that making life easier for non core contributors is at least as important.  One way we can measure this is by looking at whether such contributors are actually using VCS features beyond what they could with a traditional centralised setup.

By looking at the relative numbers of contributors who submit regular patches and those that either publish branches or submit changesets we can get an idea of how much of the VCS they have used.

It’d be interesting to see the results of a study based on contributions to various projects that have already adopted DVCS.  Although I don’t have any reliable numbers, I can guess at two things that might affect the results:

  1. Familiarity for existing developers.  There is a lot of cross pollination in Free Software, so it isn’t uncommon for a new contributor to have worked on another project before hand.  Using a VCS with a familiar command set can help here (or using the same VCS).
  2. A gradual learning curve.  New contributors should be able to get going with a small command set, and easily learn more features as they need them.

I am sure that there are other things that would affect the results, but these are the ones that I think would have the most noticeable effects.

Syndicated 2008-07-31 09:40:32 from James Henstridge

DVCS talks at GUADEC

Yesterday, a BoF was scheduled for discussion of distributed version control systems with GNOME.  The BoF session did not end up really discussing the issues of what GNOME needs out of a revision control system, and some of the examples Federico used were a bit snarky.

We had a more productive meeting in the session afterwards where we went over some of the concrete goals for the system.  The list from the blackboard was:

  • Contributor collaboration (i.e. let anyone use the tool rather than just core developers).
  • Distro ⇔ distro and distro ⇔ upstream collaboration.
  • Host GNOME source code repositories
  • Code review
  • Server side hooks
  • Translators: what to do?
  • Enforced checks
  • Offline operations
  • Documentation authors?
  • Support Win32/Mac (important for GTK)

The sys admin tasks were broken down to:

  • MAINTAINERS file syntax checking
  • PO file syntax checking
  • CIA integration.
  • Commits mailing list
  • Check that commit messages are not empty
  • Trigger updates from commits (e.g. the web site module).
  • Release notes tarballs
  • Damned Lies support

It was clear from the discussion that neither Git or Bazaar satisfied all of the criteria.

The Playground

John Carr did a great job setting up Bazaar mirrors of all the GNOME modules.  This provided an easy way for people to see play around with Bazaar.  However, it only gave you half the experience since it didn’t provide a way to publish code and collaborate.

To aid in this, we have set up the bzr-playground.gnome.org machine, which any GNOME developer should be able to use to publish branches based on John’s imports.  Instructions on getting set up can be found on the wiki.  I hope that we will get a lot of people trying out this infrastructure.

We gave a presentation today on some of the things Bazaar provides that could be useful when hacking on GNOME.  Demoing bzr-playground was a bit problematic due to the internet connection problems at the venue, but I think we still showed some useful tools for local collaboration, searching and code review.

Meanwhile, Robert Collins has been working on some of the GNOME sysadmin features that Bazaar was lacking.  Among other things, he got Damned Lies working with both Subversion and Bazaar, with a test installation on the playground machine.

Syndicated 2008-07-08 21:04:01 from James Henstridge

MySQL Announces Move to Bazaar

Bazaar logoIt has been a while coming, but MySQL has announced their move to Bazaar for version control.  This has been a long time coming, and it is great to finally see it announced publicly.

The published Bazaar branches include 8 years of history going back to MySQL 3.23.22, imported from the BitKeeper repositories.  So you can see a lot more than just the history since the switch: you can use all the normal Bazaar tools to see where the code came from and how it evolved.  Giuseppe Maxia has posted some instructions on how to check out the code for those who are interested.

I haven’t checked extensively, but I wouldn’t be surprised if this is the largest public code base managed with Bazaar.  I’ve known from personal experience working on Launchpad that it is capable of handling large trees, but it is good to have a high profile project to point at as an example now.

Syndicated 2008-06-20 09:31:11 from James Henstridge

264 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!