Why upstreams should do distribution packaging

Software comes in many shapes and styles. One of the problems the author of software faces is distributing it to their users.

As distributors we should not discourage upstreams that wish to generate binary packages themselves, rather we should cooperate with them, and ideally they will end up maintaining their stable release packages in our distributions. Currently the Debian and Ubuntu communities have a tendancy to actively discourage this by objecting when an upstream software author includes a debian/ directory in their shipped code.  I don’t know if Redhat or Suse have similar concerns, but for the dpkg toolchain, the presence of an upstream debian directory can cause toolchain issues.

In this blog post, I hope to make a case that we should consider the toolchain issues bugs rather than just-the-way-it-is, or even features.

To start at the beginning, consider the difficulty of installing software: the harder it is to install a piece of software, the more important having it has to be for a user to jump through hoops to install it.

Thus projects which care about users will make it easy to install – and there is a spectrum of ease. At one end,

checkout from version control, install various build dependencies like autoconf, gcc and so on

through to

download and run this installer

Now, where some software authors get lucky, is when someone else makes it easy to install their software, they make binary packages, so that users can simply do

apt-get install product

Now some platforms like MacOSX and Microsoft Windows really do need an installer, but in the Unix world we generally have packaging systems that can track interdependencies between libraries, download needed dependencies automatically, perform uninstalls and so on. Binary packaging in a Linux distribution has numerous benefits including better management of security updates (because a binary package can sensibly use shared libraries that are not part of the LSB).

So given the above, its no surprise to me to see the following sort of discussion on #ubuntu-motu:

  1. upstream> Hi, I want to package product.
  2. developer> Hi, you should start by reading the packaging guide
  3. (upstream is understandably daunted – the packaging guide is a substantial amount of information, but only a small fraction is needed to package any one product.)

or (less usefully)

  1. upstream> Hi, I want to package product.
  2. developer> If you want to contribute, you should start with existing bugs
  3. upstream> But I want to package product.

Another conversation, which I think is very closely related is

  1. developer> Argh, product has a debian dir, why do they do this to me?!

The reasons for this should be pretty obvious at this point:

  • Folk want to make their product easy to install and are not themselves DD’s, DM’s or MOTU’s.
  • So they package it privately – such as in a PPA, or their own archive.
  • When they package it, they naturally put the packaging rules in their source tree.

Now, why should we encourage this, rather than ask the upstream to delete their debian directory?

Because it lets us, distributors, share the packaging effort with the upstream.

Upstreams that are making packages will likely be doing this for betas, or even daily builds. As such they will find issues related to new binaries, libraries and so on well in advance of their actual release. And if we are building on their efforts, rather than discarding them, we can spend less time repeating what they did and more packaging other things.

We can also encourage the upstream to become a maintainer in the distro and do their own uploads: many upstreams will come to this on their own, but by working with them as they take their early steps we can make this more likely and an easier path.

Government data – please do it right

The Australian government 2.0 taskforce has an initiative to make data available for public remixing and use: after all its public property anyway, right? They have even run a mashup competition.

Notably missing from the excellent collection of data that has been opened is the NSW Transport and Infrastructure dataset for public transport in NSW. There is a similar dataset for the Northern Territory in the mashup transport section.

The NT dataset is under the fantastic cc-by licence. You can write an iphone app with this, a journey planner that you can cart with you while disconnected; a ‘find the closest bus I can walk to’ tool, or – well let the imagination run wild.

The NSW dataset is under a heavily restrictive license. Its so restrictive I’m not sure its feasible to write an open source tool using its data.

The meta-issue is that NSW T&I department wants control over the applications built with this data. This adds a tremendous chilling effect on potential uses of the data: the department will have to approve, with a long lead time, every use of the data, and get to tell the ‘application developer’ what to changes to make to their application.

I strongly doubt that a simple remixing of the data (e.g. with weather reports to prefer buses on very wet day) would be permitted, as it would allow other users to just read the remix and get the original data /without entering into a license agreement/.

I’m sure there is some unstated risk of openess, or benefit of control, that is shaping this problematic approach. Whatever the cause, its not open at all.

Given that the overall approach is fundamentally flawed, a blow by blow analysis of the custom license isn’t particularly useful, however I thought I would pick some highlights out to save folk the trouble ;)

  1. The dataset is behind a username/password wall [that you cannot share with others].
  2. Licensees may not be private – everyone must know you’re using the data.
  3. You must link to the 131500.com.au website
  4. You may not charge users for an app that has to be redeveloped if the dataset changes shape
  5. Any application written to use the dataset must be given to the department 30 days before release to the public.
  6. The department gets to ’suggest changes’ to any announcement related to the developers app, the license agreement or the dataset.
  7. The dataset is embargoed – you cannot share it with others.
  8. The use of the dataset has to be logged and reported.
  9. There is a restraint of use in there as well – related to Inappropriate and Offensive Material. It wouldn’t affect me, but sheese, given all the other restraints its hardly needed.

There are more gems in the details, but in short:

The department will control what, where, when and how (the data is accessed, the application’s functionality/appearance, how it was used). Hell, the 30 day requirement alone makes for slow delivery of whatever someone wants to build.

I really hope this can be improved on.

Subunit 0.0.3 should be a great little release. Its not ready yet, but some key things have been done.

Firstly, its been relicensed under BSD/Apache version 2. This makes using Subunit with other test frameworks much easier, as those frameworks tend to be permissive licenses such as the LGPL, BSD or Apache. Thanks go out to the contributors to Subunit who made this process very painless.

Secondly, the C client code is getting a few small touch ups, probably not enough to reach complete feature parity with the Python reporter.

Thirdly, the CPPUnit patch that Subunit has carried for ages has been turned into a small library built by Subunit, so you’ll be able to just install that into an existing CPPUnit environment without rebuilding CPPUnit.

Lastly, but most importantly it will have hopefully the last major protocol change (still backwards compatible!) needed for 1.0 – the ability to attach fairly arbitrary debug data in an outcome (things like ’stdout’, ’stderr’, ‘a log file X’ and so forth). This will be used via an experimental object protocol – the one I proposed on the Testing In Python list.

I should get the protocol changes done on the flight to Montreal tomorrow, which would be a great way for me to get my mind fully focused on testing for the sprint next week.

Python unittest API : Time to fix it

So, for ages now I’ve been saying that unittest is, at its core, pretty sound. I incited a talk to this effect.

I have a vision; I dream of a python testing library that:

  1. Is in the python core
  2. Is simple
  3. Is extensible
  4. Has tests take care of testing
  5. Has results take care of reporting
  6. Aids communication from test to test reader

Hopefully those are pretty modest and agreeable things to want.

However we don’t have this: nose is lovely but not in the core [and is a moderately complex API]. py.test is also not in the core, and has previously tripped my too-much-magic alerts. I must admit to not having checked if this is fixed yet. unittest itself is in the core but has some cruft which we should clean up, but more importantly is not extensible enough, which leads to extensions such as the zope testrunner having to muddy the waters between testing and reporting.

The point “Aids communication from test to test reader” is worth expanding on: automated testing is something that doesn’t need observation…until the unexpected happens. At that point some poor schmuck such as you or I ends up trying to guess what went wrong. The more data that we gather and communicate about the event, the greater the chance it can be corrected without needing a repeat run under a debugger, or worse, single stepping through the code.

There is a problem with ‘assertFoo’ methods in unittest, something that I’m not going to cram into this blog post. I will say, if you find the tendency of such methods to crawl to the base class frustrating, that you should look at hamcrest – it and similar things have been very successful in the Java unit testing world; we can learn from them.

Going back to my vision, we need to make unittest more powerfully extensible to allow projects like nose to do all the cool things they want to while still being unittest compatible. I don’t mean that nose can’t run unittest tests; I mean that unittest can’t run nose tests: nose has had to expand the contract, not simply add implementations that do more.

To that end I have a number of bugs which I need to file. Solving them piecemeal will create a fractured API – particularly if this is done over more than one release. So I am planning on prototyping in other projects, discussing like mad on the testing-in-python list, and when it all starts to come together writing up a PEP.

The bugs I have are:

  1. streams nicely: countTestCases must die/be made optional. This function is inherently incompatible with generative tests or anything beyond the simplest lightweight environments
  2. no way to wrap code around a single test. This would permit profiling, debugging, tracing, and I’m sure other things more cleanly.  (At the moment, one must ‘turn on’ the profiler in startTestCase, and turn it off in stopTestCase. This is much more awkward than simply being in the call stack). Some care will be needed here, particularly for generative tests.
  3. code that isn’t part of the implementation in the core needs to be able to work with the reporting code; allowing an optionally wider API permits extensions to be debuggable. This needs thought: do we allow direct access to TestResults? Do we come up with some added level of indirection and ‘events’? I don’t know.
  4. More data than just the backtrace needs to be included when an outcome is reporter. I’ve started a discussion on the testing in python list about this. I’m proposing that we use a dict of named content objects, and use the HTTP content-type abstraction to make the content objects introspectable and reliably handleable without tying the unittest object protocol to any given wire format – loose coupling is good!
  5. The way we signal outcomes between TestCase and TestResult – the addFailure etc methods is concerning: there are many grades of outcome that users of the framework may usefully wish to represent; in fact there are more than we probably want to put in the core. Finding a way to decouple the intent of a particular outcome from how its signalled would allow users more control while still being able to use the core framework. One particular issue in this area is that its possible with the current API to have a single test object succeed multiple times. Or fail (addFailure) then succeed (addSuccess). This causes no end of confusion, as test counts can mismatch failure counts, and so on.

I’ve got some ideas about these bugs, but I’m approaching a kiloword already, and I hope this post has enough to provoke some serious thought about how we can fix these 5 bugs, compatibly, and end up with a significantly better unittest module. We’ll have been sucessful if projects like Trial, nose and the zope testrunner are able to remove all their code that duplicates standard library functionality or otherwise worksaround these bugs, and can instead focus on adding the specific test support needed by their environments (in the Trial and zope cases), or on UI and plug-n-play (for nose).

Packaging backlog

Got some of my packaging backlog sorted out:

  • bicyclerepairman updated for the vim policy (which means it works again!)
  • python-testtools (a simple migration of the package to Debian)
  • subunit 0.0.2 released upstream and packaged for Debian.
  • testresources 0.2 ->  Debian.

And a small memo-to-self: On all new machines, echo ” filetype plugin on” >> ~/.vimrc

Back from hiatus

Well, the new blog seems to be up and running – and gathering modest numbers of comments already. Woo.

I’ve a bunch of mail about test suite performance to gather and refine into a follow up post, but that can wait a day or two.

In bzr we suffer from a long test suite, which we let grow while we had some other very pressing performance concerns. 2.0 fixes these concerns, and we’re finding the odd moment to address our development environment a little now.

One of the things I want to do is to radically reduce the cost of testing inside bzr; code coverage is a great way to get a broad picture of what is tested. Rather than reinvent the wheel (and I’ve written one to throw away, so far) – are there tools out there that can:

  • build a per-test coverage map
  • do it quickly
  • don’t include setUp/tearDown/cleanUp code paths in the map
  • report on the difference between two such maps (at the suite level)

The third point is possibly contentious, so I should expand on it. While code that is executed by code within the test’s run() method is – from the outside – all part-of-the-test, its not (by definition) the focus of the test. And I find focused tests substantially easier to analyse failures in, because they tend to check preconditions, poke at object state etc.

As I want this coverage map to help preserve coverage as we refactor the test suite, I don’t want to include accidental coverage in the map.

New blog location

My blog has moved: http://rbtcollins.wordpress.com/. If you’re syndicating me, please update to this location; if you don’t thats fine – advogato will be syndicating the blog indefinitely, but doesn’t support comments. Mega thanks to Jeff for doing an export from advogato for me :)

31 Aug 2009

Hi Rich! Re hour+long unit tests

I agree that you need a comprehensive test suite, and that it should test all the dark and hidden corners of your code base.

But time is not free! A long test suite inhibits:

  • cycle time – the fastest you can release a hot fix to a customer
  • developer productivity – you can’t forget about a patch till its passed the regression test suite
  • community involvement – if it takes an hour to run the test suite, an opportunistic developer that wanted to tweak something in your code will have walked away long ago

    Note that these points are orthogonal to whether developers edit-test cycle runs some or all tests, or whether you use a CI tool, or a test-commit tool, or some other workflow.

    All that said though, I’m extremely interested in *why* any given test suite takes hours: does it need to? What is it doing? Can you decrease the time by 90% and coverage by 2%?

    I got another response back, which talks about keeping the working set of tests @ about 5 minutes long and splitting the rest off (via declared metadata on each test) into ‘run after commit or during CI’. This has merits for reducing the burden on a developer in their test-commit cycle, but as I claim above, I believe there is still an overhead from those other tests that are pending execution at some later time.

    From a LEAN perspective, the cycle time is very important. Another important thing is handoffs. Each time we hand over something (e.g. a code change that I *think* works because it passed my local tests), there is a cost. Handing over to a machine to do CI is just as expensive as handing to a colleague. Add that contributors sending in patches from the internet may not hang around to find out that their patch *fails* in your CI build, and you can see why I think CI tools are an adjunct to keeping a clean trunk, rather than a key tool. The key tool is to not commit regressions :)

    Oh, and I certainly accept that test suites should be comprehensive… I just don’t accept that more time == more coverage, or that there isn’t a trade off between comprehensive and timeliness.

    30 Aug 2009

    Made some time to hack… the results:

    config-manager 0.4 released, re-uploaded to debian (it was removed due to some confusion a while back). This most notably drops the hard dependency on pybaz and adds specific-revision support for bzr.

    subunit snapshot packaging sorted out to work better with subunit from Ubuntu/Debian. This latest snapshot has nested progress and subunit2gtk included.

    PQM got a bit of a cleanup:

  • The status region shown during merges is ~ twice as tall now.
  • if the precommit_hook outputs subunit it will be picked up automatically and shown in the status region.
  • all deprecation warnings in python2.6 are cleaned up
  • Pending bugfixes were merged from Tim Cole and Daniel Watkins – thanks guys!
    27 Aug 2009

    Does your test suite take too long (e.g. 5 minutes). Or did it and you solved it? Or it doesn’t but its getting worse?

    Tell me more, I’d like to know :-)

