Older blog entries for jamesh (starting at number 264)

MySQL Announces Move to Bazaar

Bazaar logoIt has been a while coming, but MySQL has announced their move to Bazaar for version control.  This has been a long time coming, and it is great to finally see it announced publicly.

The published Bazaar branches include 8 years of history going back to MySQL 3.23.22, imported from the BitKeeper repositories.  So you can see a lot more than just the history since the switch: you can use all the normal Bazaar tools to see where the code came from and how it evolved.  Giuseppe Maxia has posted some instructions on how to check out the code for those who are interested.

I haven’t checked extensively, but I wouldn’t be surprised if this is the largest public code base managed with Bazaar.  I’ve known from personal experience working on Launchpad that it is capable of handling large trees, but it is good to have a high profile project to point at as an example now.

Syndicated 2008-06-20 09:31:11 from James Henstridge

How not to do thread local storage with Python

The Python standard library contains a function called thread.get_ident().  It will return an integer that uniquely identifies the current thread at that point in time.  On most UNIX systems, this will be the pthread_t value returned by pthread_self(). At first look, this might seem like a good value to key a thread local storage dictionary with.  Please don’t do that.

The value uniquely identifies the thread only as long as it is running.  The value can be reused after the thread exits.  On my system, this happens quite reliably with the following sample program printing the same ID ten times:

import thread, threading

def foo():
    print ‘Thread ID:’, thread.get_ident()

for i in range(10):
    t = threading.Thread(target=foo)
    t.start()
    t.join()

If the return value of thread.get_ident() was used to key thread local storage, all ten threads would share the same storage. This is not generally considered to be desirable behaviour.

Assuming that you can depend on Python 2.4 (released 3.5 years ago), then just use a threading.local object. It will result in simpler code, correctly handle serially created threads, and you won’t hold onto TLS data past the exit of a thread.

You will save yourself (or another developer) a lot of time at some point in the future. Debugging these problems is not fun when you combine code doing proper TLS with other code doing broken TLS.

Syndicated 2008-06-11 10:00:21 from James Henstridge

Prague

I arrived in Prague yesterday for the Ubuntu Developer Summit.  Including time spent in transit in Singapore and London, the flights took about 30 hours.

As I was flying on BA, I got to experience Heathrow Terminal 5. It wasn’t quite as bad as some of the horror stories I’d heard.  There were definitely aspects that weren’t forgiving of mistakes.  For example, when taking the train to the “B” section there was a sign saying that if you accidentally got on the train when you shouldn’t have it would take 40 minutes to get back to the “A” section.

It is also quite difficult to find water fountains in the terminal, which is inexcusable given that they don’t let people bring their own water bottles.

I had been a bit worried that they’d lose my bag, but it arrived okay in Prague.  Jonathan was not so lucky.

As well as the Ubuntu and Canonical folks, there are a bunch of Gnome developers here, including Ryan, Murray, Olav, David and Lennart.  It will be an interesting week.

Syndicated 2008-05-19 15:14:05 from James Henstridge

bzr commit –author

One of the features I recently discovered in Bazaar is the –author option for “bzr commit“.  This lets you make commits to a Bazaar branch on behalf of another person.  When used, the new revision credits two people: you as the committer and the other person as the author.

While Bazaar does make it easy for non-core contributors to send changes in a form that correctly attributes them (e.g. by publishing a branch or sending a bundle), I doubt we’ll ever see the end of pure patches.  Some cases include:

  • Patches based on a tarball release.   In these cases the contributor likely hasn’t even used the VCS.
  • People send simple diffs from e.g. “bzr diff” since that is sometimes the easiest solution (or what they do by default due to having transferred their knowledge from another VCS).
  • Some people use a VCS bridge so they can work with their favourite VCS.  They might not be able to provide their changes as Bazaar commits due to this.

The –author option lets you commit these changes in a way that credits the contributor for their work.  The author of the change will then be displayed in “bzr annotate” output and credited along with the you in the “bzr log” output.

The feature is also used by a number of plugins such as bzr-rebase: if you replay or rebase someone else’s changes, the new revisions will creit you as the committer and the original committer as the author.

Syndicated 2008-05-12 09:19:59 from James Henstridge

SSL caching on Firefox 3

Since upgrading to Ubuntu Hardy, I’ve been enjoying using Firefox 3.  The reduced memory usage has made a lot of other things nicer to use (I don’t feel like I need to buy more memory now).  One thing that is nice to see fixed is caching of SSL content.

In previous versions of Firefox, SSL content was never cached to disk with the default settings.  While you certainly don’t want all SSL content to be written to disk, a lot of it can be cached without problem.  For example, it is important that the CSS and JavaScript for a page be served via SSL to avoid man in the middle attacks (injecting arbitrary active content into a secure page is bad), but there isn’t much harm in caching them to disk: if the attacker can modify the disk cache then SSL probably doesn’t matter much.

Now it was possible to turn on disk caching in Firefox 2 through the browser.cache.disk_cache_ssl hidden option, but it had a serious drawback: the security information for resources was not saved in the disk cache so you’d get a broken padlock if resources were loaded from the cache.

Firefox 3 fixes up the disk cache to record the security information though, so turning on disk_cache_ssl setting no longer results in a broken padlock.  But what about all the people using Firefox with its default settings (or those who do not want all SSL content cached to disk)?  For these users, the web server can still cause some content to be cached.

By sending the “Cache-Control: public” response header, the server can say that a resource can be stored in the disk cache.  Firefox 3 will respect this irrespective of the disk_cache_ssl setting.  This should bring Firefox back into parity with Internet Explorer with respect to network  performance on SSL web sites.

Syndicated 2008-05-01 09:57:44 from James Henstridge

Psycopg migrated to Bazaar

Last week we moved psycopg from Subversion to Bazaar.  I did the migration using Gustavo Niemeyer’s svn2bzr tool with a few tweaks to map the old Subversion committer IDs to the email address form conventionally used by Bazaar.

The tool does a good job of following tree copies and create related Bazaar branches.  It doesn’t have any special handling for stuff in the tags/ directory (it produces new branches, as it does for other tree copies).  To get real Bazaar tags, I wrote a simple post-processing script to calculate the heads of all the branches in a tags/ directory and set them as tags in another branch (provided those revisions occur in its ancestry).  This worked pretty well except for a few revisions synthesised by a previous cvs2svn migration.  As these tags were from pretty old psycopg 1 releases I don’t know how much it matters.

As there is no code browsing set up on initd.org yet, I set up mirrors of the 2.0.x and 1.x branches on Launchpad to do this:

It is pretty cool having access to the entire revision history locally, and should make it easier to maintain full credit for contributions from non-core developers.

Syndicated 2008-04-28 14:07:50 from James Henstridge

Psycopg2 2.0.7 Released

Yesterday Federico released version 2.0.7 of psycopg2 (a Python database adapter for PostgreSQL).  I made a fair number of the changes in this release to make it more usable for some of Canonical’s applications.  The new release should work with the development version of Storm, and shouldn’t be too difficult to get everything working with other frameworks.

Some of the improvements include:

  • Better selection of exceptions based on the SQLSTATE result field.  This causes a number of errors that were reported as ProgrammingError to use a more appropriate exception (e.g. DataError, OperationalError, InternalError).  This was the change that broke Storm’s test suite as it was checking for ProgrammingError on some queries that were clearly not programming errors.
  • Proper error reporting for commit() and rollback(). These methods now use the same error reporting code paths as execute(), so an integrity error on commit() will now raise IntegrityError rather than OperationalError.
  • The compile-time switch that controls whether the display_size member of Cursor.description is calculated is now turned off by default.  The code was quite expensive and the field is of limited use (and not provided by a number of other database adapters).
  • New QueryCanceledError and TransactionRollbackError exceptions.  The first is useful for handling queries that are canceled by statement_timeout.  The second provides a convenient way to catch serialisation failures and deadlocks: errors that indicate the transaction should be retried.
  • Fixes for a few memory leaks and GIL misuses. One of the leaks was in the notice processing code that could be particularly problematic for long-running daemon processes.
  • Better test coverage and a driver script to run the entire test suite in one go.  The tests should all pass too, provided your database cluster uses unicode (there was a report just before the release of one test failing for a LATIN1 cluster).

If you’re using previous versions of psycopg2, I’d highly recommend upgrading to this release.

Future work will probably involve support for the DB-API two phase commit extension, but I don’t know when I’ll have time to get to that.

Syndicated 2008-04-15 06:55:50 from James Henstridge

Honey Bock Results

Since bottling the honey bock last month, I’ve tried a bottle last week and this week. While it is a very nice beer, the honey flavour is not very noticeable. That said, the second bottle I tried had a slightly stronger honey flavour than the first so it might just need to mature for another month or so.

If I was to do this beer again, it would make sense to use a stronger flavoured honey or just use more honey. Then again, perhaps it isn’t worth trying honey flavoured dark beers.

One beer I’d like to make again is Chilli Beer.  I haven’t seen any commercial equivalent to it, and it was great on a hot summer afternoon.  Since there were chilli pieces in the bottles of the last batch, it got hotter as it matured.  It is an interesting experience where taking a sip of the beer cools your moth down, but it starts heating up again once you swallow.

Syndicated 2008-04-10 12:13:38 from James Henstridge

Using email addresses as OpenID identities (almost)

On the OpenID specs mailing list, there was another discussion about using email addresses as OpenID identifiers. So far it has mostly covered existing ground, but there was one comment that interested me: a report that you can log in to many OpenID RPs by entering a Yahoo email address.

Now there certainly isn’t any Yahoo-specific code in the standard OpenID libraries, so you might wonder what is going on here. We can get some idea by using the python-openid library:

>>> from openid.consumer.discover import discover
>>> claimed_id, services = discover('example@yahoo.com')
>>> claimed_id
'http://www.yahoo.com/'
>>> services[0].type_uris
['http://specs.openid.net/auth/2.0/server',
 'http://specs.openid.net/extensions/pape/1.0']
>>> services[0].server_url
'https://open.login.yahooapis.com/openid/op/auth'
>>> services[0].isOPIdentifier()
True

So we can see that running the discovery algorithm on the email address has resulted in Yahoo’s standard identifier select endpoint. What we’ve actually seen here is the effect of Section 7.2 at work:

3. Otherwise, the input SHOULD be treated as an http URL; if it does not include a “http” or “https” scheme, the Identifier MUST be prefixed with the string “http://”.

So the email address is normalised to the URL http://example@yahoo.com (which is treated the same as http://yahoo.com/), which is then used for discovery. As shown above, this results in an identifier select request so works for all Yahoo users.

I wonder if the Yahoo developers realised that this would happen and set things up accordingly? If not, then this is a happy accident. It isn’t quite the same as having support for email addresses in OpenID since the user may end up having to enter their email address a second time in the OP if they don’t already have a session cookie.

It is certainly better than the RP presenting an error if the user accidentally enters an email address into the identity field. It seems like something that any OP offering email addresses to its users should implement.

Syndicated 2008-04-02 08:25:46 from James Henstridge

Looms Rock

While doing a bit of work on Storm, I decided to try out the loom plugin for Bazaar. The loom plugin is designed to help maintain a stack of changes to a base branch (similar to quilt). Some use cases where this sort of tool are useful include:

  1. Maintaining a long-running diff to a base branch. Distribution packaging is one such example.
  2. While developing a new feature, the underlying code may require some refactoring. A loom could be used to keep the refactoring separate from the feature work so that it can be merged ahead of the feature.
  3. For complex features, code reviewers often prefer to changes to be broken down into a sequence of simpler changes. A loom can help maintain the stack of changes in a coherent fashion.

A loom branch helps to manage these different threads in a coherent manner. Each thread in the loom contains all the changes from the threads below it, so the revision graph ends up looking something like this:

Sample Loom Timeline

Once the plugin has been installed, a normal branch can be converted to a loom with the “bzr loomify” command. The “bzr create-thread” command can be used to create a new thread above the current one.

The “bzr down-thread” and “bzr up-thread” commands can be used to switch between threads. When going up a thread, a merge will be performed if there are new changes from the lower thread. The “bzr show-loom” command shows the current state of the loom, and which thread is currently selected.

The “bzr export-loom” command can be used to explode the loom, creating a standard branch for each thread. The included HOWTO document gives a more detailed tutorial.

There are a few warts in the UI that I’ve encountered though:

  1. The “bzr combine-thread” command sounds like it should actually merge two threads. Instead it is an advisory command that can be used to remove a thread once its contents have been merged.
  2. After pulling new changes in from upstream on the bottom thread, it gets a bit tedious bubbling the changes up with “bzr up-thread” and “bzr commit“.
  3. As well as committing revisions to individual threads, the “bzr record” command can be used to commit the state of the loom as a whole. I haven’t really worked out when I should be using the command.
  4. No indication is given if there are changes in the loom that haven’t been recorded with “bzr record“. I’d expect some indication from “bzr status” to this effect.
  5. When using looms to break a larger feature down into smaller chunks, it’d be nice to have a command that generated a sequence of merge requests that built on top of each other. This would be the form needed to submit them for review on a mailing list.

Despite the quirks in the interface, it does make the relevant work flows easier.  It will be interesting to see how the plugin develops.

Syndicated 2008-04-01 08:52:03 from James Henstridge

255 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!