Older blog entries for jamesh (starting at number 197)

Bazaar (continued)

I got a few responses to the comparison between CVS, Subversion and Bazaar command line interfaces I posted earlier from Elijah, Mikael and David. As I stated in that post, I was looking at areas where the three systems could be compared. Of course, most people would choose Arch because of the things it can do with it that Subversion and CVS can't. Below I'll discuss two of those things: disconnected development and distributed development. I'll follow on from the examples in the previous post.

Disconnected Development

Disconnected development allows you to continue working on some code while not having access to the main repository. I hinted at how to do this in the previous post, but left out most of the details. The basic steps are:

  1. Create an archive on your machine
  2. Branch the module you want to work on into your local archive.
  3. Perform your development as normal
  4. When you connect again, switch back to the mainline, merge your local branch and commit the changes.

To create the local archive, you follow the same procedure as for creating the original archive. Something like this:

mkdir ~/archives
baz make-archive --signed joe@example.com ~/archives/joe@example.com

This creates an archive named joe@example.com (archive names are required to be an email address, optionally followed by some extra info) stored in the user's home directory.

Now we can create a branch in the local archive. From a working copy of the mainline branch, run the following command:

baz branch joe@example.com/modulename--devel--0

It was necessary to specify an archive name in this call to baz branch, because the branch was being created in a different archive to the one the working copy was pointing at. This leaves the working copy pointing at the new branch, so you can start working on it immediately.

You can commit as many revisions as you want, and compare to other revisions on the branch.

When you have access to the main repository again, it is trivial to merge your changes back into the mainline:

baz switch arch@example.org/modulename--devel--0
baz merge joe@example.com/modulename--devel--0
fix conflicts, if any exist, and mark them resolved
baz commit -s 'merge changes from joe@example.com/modulename--devel--0'

You can then ignore the branch in the joe@example.com archive, or continue to use it. If you want to continue working on the branch in that module, it is a simple matter to merge from the arch@example.org archive first to pick up the changes made while you were disconnected.

Distributed Development

In a distributed development environment, there is no main branch. Instead, each developer maintains their own branch, and pulls changes from other developers' archives. A few things fall out from this model:

  • To start working on a distributed project, you need to branch off from another developer's archive. This can be achieved using the same instructions as found in the "disconnected development" section above.
  • In order for other developers to pull changes from your archive, they will need to be able to access it. This isn't possible if it only exists in your home directory.
  • If you never merge from a particular developer, you don't even need to know they exist.
  • Conversely, you don't need to ask for permission to work on a module (however, if you want your changes to appear in the other developers' archives, you'll need to ask them to merge from you).

So assuming you've branched off an existing developer's branch of a module, and want other developers to merge your changes. Assuming they can't access your local computer, it will be necessary to create a mirror of the archive. To make the archive most widely available, you should mirror it to a directory that is published by a web server. The following command will create a mirror of the local archive:

baz make-archive --signed --listing --mirror joe@example.com sftp://hostname/home/joe/public_html/joe@example.com

Once the archive is created, you can mirror all the changes in the local archive to the remote one using the following command:

baz archive-mirror joe@example.com

If you always have access to the mirror host, it is possible to set up a hook script that mirrors after every commit. However, if you often make changes while offline you might decide to mirror manually.

Now that the archive has been mirrored, other developers can merge your changes into their working copy using the following command:

baz merge http://hostname/~joe/joe@example.com/modulename--devel--0

(after they've used your archive once, they can use the short name for the archive, and it will use the same location as last time).

Conclusion

While Arch allows full distributed development, most projects don't use it in a fully distributed manner. Often there will be a central archive that is the "official" one, which tarball releases are made from. The exact policies can differ from project to project. Some possible policies are:

  • A core of developers have commit access to an "official" archive, which releases are made from. Developers generally commit directly to this archive (this is the default CVS/Subversion model). External developers follow the distributed development model, and get core developers to merge their changes.
  • As above, but the core developers usually develop their changes on separate branches (usually in their own archives), and only merge them when ready. This is how some projects currently use CVS, but has the benefit of allowing disconnected development.
  • Control of the official archive is managed by arch-pqm. Authorized developers can send merge requests to PQM (using PGP for authentication). When a merge request is received, the branch is merged into the mainline. If there are no conflicts and the test suite runs successfully, the changes are committed.

I'm not sure which model would work best for Gnome. At least initially, one of the first two models would probably be a good choice. If you have good test coverage, PQM can help ensure that the mainline stays buildable, and changes don't destabilise things.

As has been mentioned elsewhere, regularly updated mirrors of various CVS repositories are being set up at arch.ubuntu.com. You can find out whether a mirror has been created for a module by looking it up on Launchpad. If a branch exists, you can check it out or branch it by prepending "http://arch.ubuntu.com/" to the full branch name (e.g. http://arch.ubuntu.com/‌imendio@projects.ubuntu.com/‌gossip--MAIN--0).

With the current discussion on gnome-hackers about whether to switch Gnome over to Subversion, it has been brought up a number of times that people can switch from CVS to Subversion without thinking about it (the implication being that this is not true for Arch). Given the improvements in Bazaar, it isn't clear that Subversion is the only system that can claim this benefit.

For the sake of comparison, I'm considering the case of a shared repository accessed by multiple developers over SSH. While this doesn't exploit all the benefits of Arch, it gives a better comparison of the usability of the different tools.

Setup

Before using any of CVS, Subversion or Arch, you'll need a repository. This can be done with the following commands (run on the repository server):

cvs init /cvsroot
svnadmin create --fs-type=fsfs /svnroot
baz make-archive --signed arch@example.org /archives/arch@example.org

(the --signed flag can be omitted if you don't want to cryptographically sign change sets)

Once the archive is created, you'd need to make sure that everyone has write access to the files, and new files will be created with the appropriate group ownership. This procedure is the same for each system.

Now before users of the arch archive can start using the archive, they will need to tell baz what user ID to associate. Each user only needs to do this once. The email address used should match that on your PGP key, if you're using a signed archive.

baz my-id "Joe User <joe@example.com>"

Next you'll want to import some code into the repository. This will be done from one of the client machines, from the source directory:

cvs -d :ext:user@hostname:/cvsroot import modulename vendor-tag release-tag
svn import . svn+ssh://user@hostname/svnroot/modulename/trunk
baz import -a sftp://user@hostname/archives/arch@example.org/modulename--devel--0

In the subversion case, we're using the standard convention of putting the main branch in a trunk/ subdirectory. In the arch case, you need a three-level module name, so I picked a fairly generic one.

Working with the repository

The first thing a user will want to do is to create a working copy of the module:

cvs -d :ext:user@hostname:/cvsroot get modulename
svn checkout svn+ssh://user@hostname/svnroot/modulename/trunk modulename
baz get sftp://user@hostname/archives/arch@example.org/modulename--devel--0 modulename

The user can then make changes to the working copy, adding new files with the add sub-command, and removing files with rm sub-command. For Subversion there are also mv and cp sub-commands. For Arch, the mv sub-command is supported.

To bring the working copy up to date with the repository, all three systems use the update sub-command. The main difference is that CVS and Subversion will only update the current directory and below, while Arch will update the entire working copy.

If there are any conflicts during the update, you'll get standard three-way merge conflict markers in all three systems. Unlike CVS, both Subversion and Arch require you to mark each conflict resolved using the resolved sub-command.

To see what changes you have in your working copy, all three systems support a diff command. Again, this works on the full tree in Arch, while only working against a subtree in CVS and Subversion. In all three systems, you can request diffs for individual files by passing the filenames as additional arguments. Unfortunately baz requires you to pass "--" as an argument before the filenames, but hopefully that'll get fixed in the future.

When it is time to commit the change, all three systems use the commit sub-command. This command also works on a full tree with Arch.

Branching and Merging

Creating a branch is relatively easy in all three systems:

cvs tag foo-anchor . ; cvs tag -b foo .
svn cp . svn+ssh://user@host/svnroot/modulename/branches/foo
baz branch modulename--foo--0

Unlike CVS and Subversion, the baz command will also switch the working copy over to the new branch. By default it will create a branch in the same repository, but can just as easily create a branch in another location.

To switch a working copy between branches, the following commands are used:

cvs update -r foo
svn switch svn+ssh://user@host/svnroot/modulename/branches/foo
baz switch modulename--foo--0

If we switch the working copy back to the trunk, we can merge the changes from the branch you'd do the following:

cvs tag -r foo foo-DATE .; cvs update -j foo-anchor -j foo-DATE .
svn merge -r branch-rev:HEAD svn+ssh://user@host/svnroot/modulename/branches/foo
baz merge modulename--foo--0

This is where Arch's history sensitive merging starts to shine. Since the working copy retains a record of what changes it is composed of, the merge operation simply pulls over the changes that exist in the branch but not in the working copy -- there is no need to tell it what range of changes you want to apply.

To merge more changes from the branch, the CVS and Subversion commands change, while the Arch one remains constant:

cvs tag -r foo foo-DATE .; cvs update -j foo-LAST-DATE -j foo-DATE .
svn merge -r last-merge-rev:HEAD svn+ssh://user@host/svnroot/modulename/branches/foo
baz merge modulename--foo--0

Conclusion

The current Bazaar command line interface isn't that different from CVS and Subversion (it's definitely worth a second look if tla scared you off). The main difference is that some of the operations work on the whole working copy rather than a subset by default. In practice, this doesn't seem to be much of a problem.

The history sensitive merge capabilities would probably be quite useful for Gnome. For example, it would make it trivial to merge bug fixes made on the stable branch to the head branch.

Disconnected development is a natural extension to the branching and merging support mentioned earlier. The main difference is that you'd have to make a local archive, and then create your branch of the code in that archive instead of the main one. The rest is handled the same.

Somethings wrong with the Immigration Department

Shortly after the scandal over Cornelia Rau (a mentally ill Australian who was in detention for 10 months), another case gets some media attention: Vivian Young/Alvarez/Solon.

She is an Australian citizen born in the Phillipines, who also suffers from a mental illness. From the news reports, the sequence of events seems to be:

  1. In 1984, Vivian moved to Australia to live with her new husband.
  2. In 2001, she was involved in a car accident in NSW. While being treated at Lismore Hospital for her injuries, she lodged a citizenship application and the staff contacted the immigration officials. She gave her name as "Vivian Alvarez".
  3. On July 17, 2001, the Queensland Department of Families finally notified police that "Vivian Young" was missing.
  4. Days later, she was deported to the Phillipines -- neither the NSW or Qld police noticing that she was on the missing persons list. Apparently she was pushed onto the plane in a wheelchair, still suffering from head injuries.
  5. In 2003, an immigration official discovered the mistake while looking through the missing persons list. It doesn't seem that any action was taken at this time.
  6. This month, the mistaken deportation becomes public. This is the first time that the family is notified -- four years after the deportation, and two years after the mistake had been discovered. The government says they don't know her location, but are doing everything in their power to find her.

Among the Australian family, she left behind a son who is still in foster care.

Rather than being an isolated case, it is quite likely that there have been other questionable deportations -- this one getting more attention because the person in question is an Australian. This case has racial overtones too, since it is unlikely that a white Australian would have been deported under the same circumstances. Despite all this, the Minister for Immigration does not feel that a Royal Commission would be appropriate.

bgchannel:// Considered Harmful?

Recently Bryan posted about background channels -- a system for automatic updating desktop wallpaper. One of the features of the design is a new URI scheme based on the same ideas as webcal://, which I think is a bad idea (as dobey has also pointed out).

The usual reasoning for creating a URI scheme like this go something like this:

  1. You want to be able to perform some action when a link in a web page is clicked.
  2. The action requires that you know the URI of the link (usually to allow contacting the original server again).
  3. When the web browser activates a helper application bound to a MIME type, you just get the path to a saved copy of the resource, which doesn't satisfy (2).
  4. Helper applications for URI types get passed the full URI.

So the solution taken with Apple's iCal and Bryan's background channels is to strip the http: off the start of resource's URI, and replace it with a custom scheme name. This works pretty well for the general case, but causes problems for a few simple use cases that'll probably turn out to be more common than you think:

  • Serving a background channel (or calendar, or whatever) via a protocol other than http. The first alternative protocol you'll probably run into is https, but there may be other protocols you want to support in the future.
  • Any links to a background channel will need to be fully qualified since they use a different scheme. If you move your site, you'll need to update every page that links to the background channel. If you could use relative URIs in the links, this wouldn't be the case.

One alternative to the "new URI scheme" solution, that doesn't suffer from the above problems is to serve a "locator file" from the web server that contains the information needed to request the real information. Even though the helper application will only get the path of a temporary file, the content of the file lets the app connect to the server. This is the approach taken by BitTorrent, and various media players like RealPlayer.

The separate "locator file" can even be omitted by placing the background channel location inside the background channel itself. This is the approach taken for Atom, via a <link rel="self"/> link.

Ubuntu Down Under

I have been in Sydney for the past week for UDU, which wraps up tomorrow. It has been great meeting up with everyone again, but has also been exhausting.

Some of the stuff on the horizon will be quite ground breaking. For instance, I don't think anyone has attempted something like Grumpy Groundhog (which will hopefully be very useful to both the distro team, and upstream projects like Gnome).

Python

Experimented with using the new ELF visibility attribute support in GCC 4 in Python, and came up with this patch. It restricts the list of exported symbols to just the ones marked with the PyAPI_FUNC and PyAPI_DATA markers, which omits all the private symbols that /usr/bin/python or libpythonX.Y.so export.

In addition, it uses "protected" visibility for all the exported symbols, which means that internal calls to the public Python API don't have to go through the PLT (which they do if Python is compiled as a shared library).

In the shared libpython case, this speeds things up by about 5% (according to pystone and parrotbench), which isn't too bad for a small patch. In the static libpython case, it seems to slow things down slightly -- by < 1% in my tests so far.

Of course, the shared libpython case is still slower than the static version (which is why /usr/bin/python doesn't use a shared libpython on Ubuntu), but it does make it less slow than it was before :)

Solaris

Glynn: If Solaris feels like a second class citizen, it is probably because hardly any hackers have access to Solaris machines (the same seems to be true of architectures other than i386). A fair number of developers would probably be interested in fixing Solaris build failures if they knew that they existed.

I realise that Sun doesn't want to provide external access to a build machine (at least, that's what I was told last time I asked some Sun/Gnome hackers), but maybe running a tinderbox style system and publishing the build logs would help. As well as telling me if my package is broken, it'd give me a way to tell whether the fixes I check in actually solve the problem.

pkg-config

One of the changes in the recent pkg-config releases is that the --libs output no longer prints out the entire list of libraries expanded from the requested set of packages. As an example, here is the output of pkg-config --libs gtk+-2.0 with version 0.15:

-lgtk-x11-2.0 -lgdk-x11-2.0 -latk-1.0 -lgdk_pixbuf-2.0 -lm -lpangoxft-1.0 -lpangox-1.0 -lpango-1.0 -lgobject-2.0 -lgmodule-2.0 -ldl -lglib-2.0

And with 0.17.1:

-lgtk-x11-2.0

If an application is compiled with the first set of -l flags, it will include DT_NEEDED tag for each of those libraries. With the second set, it will only have a DT_NEEDED tag for libgtk-x11-2.0.so.0. When run, the application will still pull in all the other libraries via shared library dependencies.

The rationale for this change seems to boil down to:

  • Some programs link to more libraries than they need to.
  • Sometimes programs link to libraries that they don't use directly -- they're encapsulated by some other library they use.
  • The application will need to be recompiled if one of the libraries it is linked against breaks ABI, even if the library is not used directly.

At first this seems sensible. However, in a lot of cases applications actually use libraries that are only pulled in through dependencies. For instance, almost every GTK application is going to be using some glib APIs as well.

With the new pkg-config output, the fact that the application depends on the ABI of "libglib-2.0.so.0" is no longer recorded. The application is making use of those APIs, so it declare that. Without the glib DT_NEEDED tag, the application is relying on the fact that GTK isn't likely to stop depending on glib ...

Furthermore, this causes breakage if you link your application with the libtool -no-undefined flag. On platforms that support it, this generates an error if you don't list all the libraries the application depends on. This allows for some optimisations on some platforms (e.g. Solaris), and is required on others (e.g. Win32).

(interestingly, this problem doesn't exhibit itself on Linux. The -no-undefined flag expands to nothing, even though the linker supports the feature through the -zdefs flag)

For these reasons, I've disabled the feature in jhbuild's bootstrap, using the --enable-indirect-deps configure flag. If the aim is just to get rid of unnecessary library dependencies, the GNU linker's --as-needed flag seems to be a better choice. It will omit a DT_NEEDED tag if none of the symbols from the library are used by the application.

The Colour Purple

If you look at the bottom of Cadbury's website in the footer of the page, you find the following text:

..., and the colour purple are Cadbury Group trade marks in Australia.

Apparently Cadbury believes they can trade mark a colour, and according to a story on the radio they've been sending out cease and desist letters to other small chocolate makers in Australia.

It turns out that even though they are claiming it as a trade mark, they only have a pending application. The details can be found by going to here, choose "enter as guest", and enter "902086" into the search box at the bottom (it doesn't seem like you can bookmark a particular application).

It seems that the application has been pending since February 2002, and was deferred at the applicant's request 5 months later. So it seems weird that they've started trying to assert it now. The 17 categories the application mentions include soaps and perfumes, jewellery, kitchen utensils, clothing and leathergoods (it also includes classes that you'd expect a chocolate company to claim).

It seems like a clear abuse of the trade mark system, and I'm surprised it isn't getting more news coverage.

New pkg-config

I recently pointed jhbuild's bootstrap module-set at the new releases of pkg-config, which seems to have triggered some problems for some people.

In some ways, it isn't too surprising that some problems appeared, since there were two years between the 0.15 and 0.16 releases. When you go that long without testing from packages that depend on you, some incompatibilities are bound to turn up. However, Tollef has been doing a good job fixing the bugs and 0.17.1 fixes most of the problems I've found.

So far, I've run into the following problems (some reported to me as jhbuild bug reports):

  • PKG_CHECK_MODULES() stopped evaluating its second argument in 0.16.0. This caused problems for modules like gtk+ [#300232, fd.o bug #2987 -- fixed in pkg-config-0.17].
  • The pkg.m4 autoconf macros now blacklist words matching PKG_* or _PKG_* in the resulting configure script (with the exception of PKG_CONFIG and PKG_CONFIG_PATH). This is to try and detect unexpanded macros, but managed to trip up ORBit2 (ORBit2 has since been fixed in CVS though). [#300151]
  • The PKG_CHECK_MODULES() macro now uses the autoconf result caching mechanism, based on the variable prefix passed as the first argument. This means that multiple PKG_CHECK_MODULES() calls using the same variable prefix will give the same result, even if they specifiy a different list of modules [#300435, #300436, #300449]
  • The pkg-config script can go into an infinite loop when expanding the link flags if a package name is repeated [fd.o bug #3006, workarounds for some Gnome modules: #300450, #300452]

(note that only the last problem is likely to affect people building existing packages from tarballs)

Appart from these problems, there are some new features that people might find useful:

  • Unknown headers are ignored in .pc files. This will make future extensions possible. Previously, if you wanted to make use of a feature in a newer version of pkg-config in your .pc, you'd probably end up making the file incompatible with older versions. This essentially meant that a new feature could not be used until the entire userbase upgraded, even if the feature was non-critical.
  • A new Url header can be used in a .pc file. If pkg-config finds a version of a required package, but it is too old, then the old .pc file can print a URL telling people where to find a newer version. Unfortunately, if you use this feature your .pc file won't work with pkg-config <= 0.15.
  • A virtual "pkg-config" package is provided by pkg-config. It doesn't provide any cflags or libs, but does include the version number. So the following are equivalent:
    pkg-config --atleast-pkgconfig-version=0.17.1
    pkg-config --exists pkg-config '>=' 0.17.1
    This may not sound useful at first, but you can also list the module in the Requires line of another .pc file. As an example, if you used some weird link flags that pkg-config used to mangle but has since been fixed, you can encode that requirement in the .pc file. Of course, this is only useful for checking for pkg-config versions greater than 0.16.

Tracing Python Programs

I was asked recently whether there was an equivalent of sh -x for Python (ie. print out each statement before it is run), to help with debugging a script. It turns out that there is a module in the Python standard library to do so, but it isn't listed in the standard library reference for some reason.

To use it, simply run the program like this:

/usr/lib/python2.4/trace.py -t program.py

This'll print out the filename, line number and contents of that line before executing the code. If you want to skip the output for the standard library (ie. only show statements from your own code), simply pass --ignore-dir=/usr/lib/python2.4 (or similar) as an option.

BitKeeper

So the free (no-cost) version of BitKeeper has been discontinued, leaving just the commercial version and the limited open source version (which is essentially limited to checking out the head revision of a particular tree).

It seems a bit weird that one of the stated reasons for discontinuing the free version is a dispute with OSDL, where some employees were using BitKeeper (eg. Linus), while another unrelated employee was reverse engineering it as a personal project. This is a bit surprising, since it seems that a scenario almost the same as this was brought up last year and Larry said his concern was a licensed BitKeeper user helping someone else reverse engineer the code. Of course, there are probably other issues involved here.

This does bring up an interesting issue of what users of the free version are going to do with their repositories. While they can use the open source editing to easily check out the head revision and continue development, it isn't clear that it can be used to extract all the information stored in a repository. And since BitMover has refused to sell the commercial version to some people, it is conceivable that some projects could find themselves unable to access their revision history with BitKeeper.

I doubt this situation is acceptable to many users (they are using a version control system, so probably want to keep their revision history), so there will probably be some programs written to extract all the information from a BitKeeper repository. Ironically, this could add some value to BitKeeper for BitMover's commercial customers -- insurance for their data in case BitMover disappears or something else makes BitKeeper unusable to them.

Airports

If you are coming to Australia for first time, make sure you pack your camel suit and other valuable in your cabin luggage, rather than the checked luggage. It will save you trouble in the long run.

roozbeh: the Fedora EULA probably isn't a GPL violation (I'm sure Red Hat has legal advice that it is okay). Section 1 says "This agreement does not limit User's rights under, or grant User rights that supersede, the license terms of any particular component". So the EULA explicitly says that it doesn't limit any rights you received under the GPL. Section 2 goes on to say that your rights to copy or modify individual components of the distro are covered by the respective license.

What the EULA does cover is the particular compilation of the individual components making up the distribution. This is similar to the way a book publisher can claim copyright on a particular selection/ordering of poems that are in the public domain — while you can copy the individual poems, it would be a violation to copy the anthology as a whole.

The export controls section is just a restatement of the U.S. export regulations for cryptography, so wouldn't affect the non cryptographic portions. I'm not sure how this section would interact with the first section in the case of GPL'd/LGPL'd cryptography software though.

188 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!