9 May 2010 robertc   » (Master)

Maintainable pyunit test suites

There’s a test code maintenance issue I’ve been grappling with, and watching others grapple with for a while now. I’ve blogged about some infrastructural things related to it before, but now I think its time to talk about the problem itself. The problem shows up as soon as you start writing setUp functions, or custom assertThing functions. And the problem is – where do you put this code?

If you have a single TestCase, its easy. But as soon as you have two test classes it becomes more difficult. If you choose either class, the other class cannot use your setUp or assertion code. If you create a base class for your tests and put the code there you end up with a huge base class, and every test paying the total overhead of your test needs, rather than just the overhead needed to test the particular system you want to test. Or with a large and growing list of assertions most of which are irrelevant for most tests.
The reason the choices have to be made is because test code is just code; and all the normal issues there – separation of concerns, composition often being better than inheritance, do-one-thing-well – all apply to our test code. These issues are exacerbated by pyunit (that is the Python ‘unittest’ module included with the standard library and extended by various projects)
Lets look some (some) of the concerns involved in a test environment: Test execution, fixture management, outcome decision making. I’m using slightly abstract terms here because I don’t want to bind the discussion down to an existing implementation. However the down side is that I need to define these terms a little.
Test execution – by this I mean the basic machinery of running a single test: the test framework calling into user code and receiving back an outcome with details. E.g. in pyunit your test_method() code is called, success is determined by it returning successfully, and other outcomes by raising specific exceptions. Other languages without exceptions might do this returning an outcome object, or passing some object into the user code to be called by the test.
Fixture management – the non trivial code that prepares a situation where you can make assertions. On the small side, creating a few object instances and glueing them together, on the large end, loading data into a database (and creating the database instance at the same time). Isolation issues such as masking out environment variables and creating temp directories are included in this category in my opinion.
Outcome decision making – possibly the most obtuse label I’ve ever given this, I’m referring the process of deciding *what* outcome you wish to have happen. This takes different forms depending on your testing framework. For instance, in Python’s doctest:
>>> x
45
provides a specification – the test framework calls str(x) and then compares that to the string ’45′. In pyunit assertions are typically used:
self.assertEqual(45, x)
Will call 45 == x and if the result is not True, raise an exception indicating a Failure has occured. Unexpected exceptions cause Errors, and in the most recent pyunit, and some extensions, other exceptions can signal that a test should not be run, or should have failed.
So, those are the three concerns that we have when testing; where should each be expressed (in pyunit)? Pragmatically the test execution code is the hardest to separate out: Its partly outside of ‘user control’, in that the contract is with the test framework. So lets start by saying that this core facility, which we should very rarely need to change, should be in TestCase.
That leaves fixture management and outcome decision making. Lets tackle decision making… if you consider the earlier doctest and assertion examples, I think its fairly clear that there are multiple discrete components at play. Two in particular I’d like to highlight are: matching and signalling. In the doctest example the matching is done by string matching – the reference object(s) are stringified and compared to an example the test writer provides. In the pyunit example the matching is done by the __eq__ protocol. The signalling in the doctest example is done inside the test framework (so we don’t see any evidence of it at all). In the pyunit example the signalling is done by the assertion method calling self.fail(), that being the defined contract for causing a failure. Now for a more complex example: testing a float. In doctest:
>>> “%0.3f” % x
0.123
In pyunit:
self.assertAlmostEqual(0.123, x, places=3)
This very simple check – that a floating point number is effectively 0.123 exposes two problems immediately. The first, in doctest, is that literal string comparisons are extremely limited. A regex or other language would be much more powerful (and there are some extensions to doctest; the point remains though – the … operator is not enough). The second problem is in pyunit. It is that the contract of assertEqual and assertAlmostEqual are different: you cannot substitute one in where the other was expected without partial function application – something that while powerful is not the most obvious thing to reach for, or to read in code. The JUnit folk came up with a nice way to address this: they decoupled /matching/ and /deciding/ with a new assertion called ‘assertThat’ and a language for matching – expressed as classes. The initial matcher library, hamcrest, is pretty ugly in Python; I don’t use it because it tries too hard to be ‘english like’ rather than being honest about being code. (Aside, what would ‘is_()’ in a python library mean to you? Unless you’ve read the hamcrest code, or are not a Python programmer, you’ll probably get it wrong. However the concept is totally sound. So, ‘outcome decision making’ should be done by using a matching language totally seperate from testing, and a small bit of glue for your test framework. In ‘testtools’ that glue is ‘assertThat’, and the matching language is a narrow Matcher contract (in testtools.matchers) which I’m going to describe here, in case you cannot or don’t want to use the testtools one.
class Matcher:
    def __str__(self):
        "Describe this matcher."""
    def match(self, something):
        """Determine if something is matched.
        :param something: Something to match.
        :return: None if something matched, or a Mismatch object otherwise.
        """
class Mismatch:
    def describe(self):
        """Describe a mismatch that has occured."""
This permits composition and inheritance within your matching code in a pretty clean way. Using == only permits this if you can simultaneously define an __eq__ for your objects that matches with arbitrarily sensitivity (e.g. you might not want to be examining the process_id value for a process a test ran, but do want to check other fields).
Now for fixture management. This one is pretty simple really: stop using setUp (and other similar on-TestCase methods). If you use them, you will end up with a hierarchy like this:
BaseTestCase1
 +TestCase1
 +TestCase2
 +BaseTestCase2
   +TestCase3
   +TestCase4
   +BaseTestCase3
     +TestCase5
     ...
That is, you’ll have a tree of base classes, and hanging off them actual test cases. Instead, write on your base TestCase a single glue method – e.g.
def useFixture(self, fixture):
      fixture.setUp()
      self.addCleanup(fixture.tearDown)
      return fixture
And then rather than having a setUp function which performs complex operations, define a ‘fixture’ – an object with a setUp and a tearDown method. Use this in tests that need that code::
def test_foo(self):
      server = self.useFixture(NewServerWithUsers())
      self.assertThat(server, HasUser('fred'))
Note that there are some things around that offer this sort of convention already: thats all it is – convention. Pick one, and run with it. But please don’t use setUp; it was a conflated idea in the first place and is a concrete problem. Something like testresources or testscenarios may fit your needs – if it does, great! However they are not the last word – they aren’t convenient enough to replace just calling a simple helper like I’ve presented here.
To conclude, the short story is:
  • use assertThat and have a seperate hierarchy of composable matchers
  • use or create a fixture/resouce framework rather than setUp/tearDown
  • any old TestCase that has the outcomes you want should do at this point (but I love testtools).

Syndicated 2010-05-09 16:21:35 from Code happens

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!