Older blog entries for malcolm (starting at number 107)

Odd Python Fact

Using the long weekend to get some intensive hacking done, I'm converting Django's internals to be more transparently unicode aware. All this character encoding twiddling has me thinking about performance, so I've been writing lots of little test programs to time features.

One unusual result that popped up this afternoon concerned reading a UTF-8-encoded file. Contrary to my intuition, this version:

  data = open(filename).read()
data.decode('utf-8')

was consistently a little bit faster than this version:

  data = codecs.open(filename, 'r', 'utf-8').read()

Admittedly the differences were generally (much) less than 5%, in favour of the first version, but I was a little surprised there was any real difference at all. I'm not worried by this result, but I would have guessed incorrectly.

In both cases, I'm reading in the data and converting it a unicode string. I was running it against some examples I had lying around from Markus Kuhn. The results were consistent if I changed the order of the tests or intermixed them. Aliasing codecs.open to a global variable sped up the second method very slightly, but not enough to catch up. I was careful to pre-fill the disk buffer cache and run each test enough times in a loop for any noise on a single run to be absorbed.

Turns out, the results are closest (essentially identical speed) for files that have mostly one byte per character (pure ASCII files being the fastest) and diverged the most for more complex characters. The runic poem, with lots of three byte characters, and Greek text, which is entirely two byte characters were the most divergent.

Syndicated 2007-04-07 19:26:47 from Malcolm Tredinnick

Software Design Responsibility

Being a public holiday today, I spent the morning lazing in bed and ended up re-reading most of Chris Crawford's book, On Game Design. It's a fun read from somebody who has both experience and opinions.

It also contains a passage that stuck with me the first time I read it and has guided a lot of my design thinking since then (from chapter 18):

"Some people think that open mindedness requires a deisgner to hear out every idea, to give every suggestion its day in court. This isn't noble; it's stupid. Seriously considering every idea that drifts by isn't a sign of open mindedness; it's an indicator of indecisivness. A good designer has already thought through all the basics of the design and so should be able to reject a great many ideas without much consideration, To put it another way, you should already have considered most of the ideas that are put to you; if somebody surprises you with an idea you didn't think of, you should consider it a warning sign that you haven't thought through the design carefully enough. If the rgeat majority of ideas that are offered you have already gone through your mill, you should have no problem rejecting them without much consideration. Other people will consider you a narrow-minded prima donna for doing so; let them. Your job is to build a great design, not gratify your co-worker. Be courteous, but concentrate on doing your job."

I'm not in 100% agreement with this as it applies to Open Source development. There are many areas in a project where the primary developers and designers mentaly or explicitly mark an area as "we'll work that out later" or "to be revisited when we have time". Design discussions in those areas are often fruitful and enightening to everybody.

In many areas, though, the main designers usually have thought through a lot of the decisions and have developed an intuition about what works, what won't and what direction they want to head in. I do think this sort of approach to open mindedness is only applicable for reasonably well established designs by developers with a certain track record of success. Something in the early stages of development or being as a first project should welcome holistic reassessment.

It is not incumbent upon a software package maintainer, even in Open Source circles, to continually revisit and rejustify every decision whenever somebody new wants to challenge a particular case. That is ineffective.

When reading through source code, you can often work out, without too much difficulty, the design thread running through various decisions. Initially, accept that as gospel. Later, as you become more familiar, look for areas of inconsistency. Does the inconsistency matter? Can it be reconciled? Is it actually a cause for problems? True sharp edges or ill-fitting pieces exist in most applications. Sometimes it's best to just acknowledge they are there and move on. Sometimes, you can smooth them off with a bit of a rewrite, although this may force some changes on the users, too. Often, the number of inconsistencies are not as great as people might think. Any number of alternative solutions may not be implemented or even explored in great detail; because they do not need to be. Once you have one solution that fits your design approach succesfully, it's usually pragmatic to move on. Iterative improvements will still be made, but radical design changes aren't as necessary as some people would like to believe.

Finally, the closing sentence in the above quote is an important responsibility for the designer:being polite never killed anybody. Soliciting suggestions, even though you may have already implicitly considered many of them already is a tricky task. You need the haystack, for it contains the one needle that fixes an important problem. Encourage contributors, but ask them to have realistic expectations, too.

Postscript: I probably didn't make it clear why this quote from Crawford has influenced my thinking. It caused me to think.. Because I don't completely agree with it, yet can see some elements of wisdom there, it has made me think a lot about when design discussion are useful and when a project design leader should cut off the discussions and pick a path.

Second postscript: Turns out I've blogged about this quote before (see October 25 and 27 entries on that page).

Syndicated 2007-04-06 21:32:24 from Malcolm Tredinnick

Australian Open Content Licensing

The proceedings of a 2005 conference, "Open Content Licensing: Cultivating The Creative Commons", held at Queensland University of Technology are now available.

I've only read a few articles so far, but the contents look interesting and read well. I originally noticed this via Glynn Moody's "open..." blog and he made a throwaway remark wondering why it took so long to produce. These are well-edited articles. Given the caliber of the speakers (some law professors, judges, lawyers, ...), it's not surprising they are busy people, so a bit of time for the back-and-forth is not unexpected. The results seem worth the wait.

One thing I did note about the list of speakers, though: nobody that I recognised as representing the Australian Open Source community. Yes, Open Source development is not the same as Creative Commons production, but Open Source contributors do generate both a lot of Creative Commons content and tools for working with said content. They aren't completely disjoint fields.

Syndicated 2007-04-03 20:04:29 from Malcolm Tredinnick

Things You Don't Realise Ahead Of Time...

Just over a month ago, I was in a car accident and broke a couple of ribs (ironic, since I don't drive — I was a passenger). Fortunately, that was the worst of the injuries anybody suffered, so it ended reasonably well for all involved. I'm not going to be permanently incapacitated or anything like that. I still look as ugly as I always did. No bad facial injuries that required a miraculous plastic surgery reconstruction or anything like that. Good luck all around.

However, the ribs still hurt... and quite a lot. Doctor tells me today it could be another four weeks or so before I stop feeling twinges when I move anything attached to my left side. I haven't hurt my ribs badly before, at least not that I'm aware of. You hear about how annoying it is when somebody else does it. Believe them! Laughing, coughing, moving in your chair or bed.. even breathing. Ouch. :-(

Turns out the human body is a very inter-connected piece of machinery. You have an arm connected to each side, the shoulder muscles run across the front and sides of the rib cage. Your lungs expand inside the rib cage. There's a (hopefully) beating heart in there. All these moving parts!

I'm just saying, you know?! (Actually, I'm just bitching. You can all go back to sleep now, the whining has stopped.)

Syndicated 2007-04-03 17:44:00 from Malcolm Tredinnick

An Amusing Quote

This let me start the day with a smile...

"One thing that Django does very well is its debug mode error pages. I saw more of those then I probably should have. ;-)"

Django: helping users maintain good humour in the face of periodic failures since 2005.

(From Speno's Python Avocado via Planet Python )

Syndicated 2007-04-02 10:33:35 from Malcolm Tredinnick

Hugo Assigned Reading

The Hugo award nominees for 2007 were published yesterday-ish. I was looking through the list of novels and realised I had only read one of them and had only peripherally heard of two of the other authors. Normally I have a slightly better batting average than that. Must have been a bad year. More likely, I'm getting old and unadventurous in my reading choices — I have noticed I've bought a lot less science fiction this past year.

This called for a spontaneous visit to the local bookshop so that I can mix with the cool people at cocktail parties and talk about the plot twists of each novel. For future reference, if you're going to go to a bookshop to buy the various nominated books, it's a good idea to know what they are! This may sound obvious, in hindsight, but it was an important first step that I did not make. So I'll be going back tomorrow.

For future reference, too, Naomi Novik's book is published in Australia under the UK title of Temeraire , not the way cooler Her Majesty's Dragon title. I'd actually only heard of this book a few days ago because of some randomness on Justine Larbelestier's blog. So Novik was one of the two author's I'd heard of but not read. The other is Charlie Stross, which is an oversight on my part, because I keep hearing good things about his books.

Syndicated 2007-03-30 22:55:38 from Malcolm Tredinnick

Qantas Messes Up Request For Help

Qantas — the Australian international airline — sent out a mass mailing earlier this evening asking for people to volunteer for their Customer Advisory Panel. This option is available to Gold and Platinum level frequent fliers.

I don't mind participating in things like this from time to time and I try to support OneWorld airlines when I travel so that I can have some benefits from being a decent frequent flier.

Except it's been nothing but disappointment so far. The email contained a link to the initial entry survey, which I guess is part of the sign-up process, there being no other obvious way to sign up. The link goes to the right website, but returns an error. It's an opaque link, so no way to work out if there's just a typo. There's a second opaque link to the Terms and Conditions for the marketing drive (which is what this is) does work, though, and it mentions that the whole process is outsourced to a Canadian firm. So much for "buy Australian", but not a huge problem for me — I'm a citizen of the planet, not just one particular country.

Okay, so back to my error page... look in the HTML for any obvious problems. Nothing in the comments. Although somebody needs to run the generated page through a validator or at least something that checks to make sure all tags are closed. The trailing '>' character is not optional in HTML tags.

Let's try emailing their support address, which is linked on the page. Click on the mailto: link. Compose the email containing all the information, including my frequent flier number. Off she goes... hmm... back she comes... :-( Turns out the mail cannot be delivered. The target server did not accept the RCPT TO line and returned a 550. So the initial link is broken, the generated HTML is broken (minor point, but indicative) and the support email address is broken. It's just roses all around.

Now, why am I worried that the current Qantas takeover bid is going to screw around their so-called valued customers? I may have to join my bunker-mentality friends and start cashing in my frequent flier points if this sort of behaviour keeps up.

Syndicated 2007-03-30 19:44:35 from Malcolm Tredinnick

My List(s) Of Working Programmer's Books

A couple of weeks ago, Bill de hÓra published his updated list of ten books for the working programmer. There are other such lists around, of course. Bill's list was the one that most recently crossed my eyesight and reminded me I've been meaning to publish my own list.

It's not completely clear what the rules of this game are, so I had to invent my own a little bit. I've stuck to Bill's choice to be language and platform neutral, as much as possible. Rather than try to pick ten books I think everybody should read, I wrote a list of books I currently use regularly and find useful in my day-to-day work. I've added a few near-finalists and some other interesting books at the end, too.

Any list like this is going to be swayed by the experiences and interests of the author. I realised my list is a bit more skewed towards process than practice than it might have been five years ago. This is partially a reflection of the fact that I've done a lot more project and team management in the last few years than at any previous point. All the books on this list are ones I either consult regularly as a reference, or try to re-read every now and again to keep the thoughts moving around in my head.

In no particular order:

First obvious difference with other "top 10" lists: mine only has eight items. Life's like that sometimes.

I'm surprised that Introduction to Algorithms doesn't get more words written about it. Sure, it's a pretty fundamental book. However, it includes a lot of the basic thoeretical underpinnings about the algorithms and implementation differences for algorithms that just isn't written down in other places. I don't use this book every day, but when I need the details of many algorithms, it's the first book I'll reach for. Still, this really is a fundamentals book, whilst the Algorithm Design Manual is more educational and thought provoking for real world, large information set problems.

I often work around problems that require security of various levels. Practical Cryptography is a bit like Introduction to Algorithms, albeit at a slightly more mathematical level, in that it gives a very solid theoretical grounding in the fundamentals of hashing and encrypting. It is futile to enter a discussion about the security of one approach over the other if you don't know this information and can't back it up with a reference to a book like this. Cryptography being a fast moving area of reasearch, a four year old book is going to show some dating by now, but it's still something I use regularly to back up my hunches or as a citation source.

Most of the others should be self-explanatory if you've read them. There appears to be some genuine controversy about whether Scott Berkun's book on project management is great or gross (see the comments on Bill's post, for example). I was surprised to see that my take was almost identical the thoughts Bill expressed in a comment — the Berkun book is very practical.

I'll just mention, too, that most lists like this include Steve McConnell's Code Complete (usually meaning the 2nd edition). I'm not a great fan of that book. It's a nice read and I have no argument with the content or approach. It's just not a book that I've found helped me a great deal. The McConnell book in my list above, Rapid Development is one I get more use out of as a way of translating between my brain and a more professional, standard way of presenting ideas. Using McConnell's approach and terminology eases the presentation to more formal project managers and decision makers.

There are some near misses. Mostly books that I have gotten a lot of education from, but no longer use on a regular basis because I feel I have absorbed their lessons. All of these books still sit on my shelves, though, and I would give them to versions of myself that were five or ten or 15 years less experienced (not all at once, some require more experience than others to be useful):

  • Master Regular Expression (Friedl)
  • Herding Cats: A Primer For Programmers Who Lead Programmers (Rainwater)
  • Career Programmer: Guerilla Tactics for an Imperfect World (Duncan)

The Friedl book on regular expressions makes a lot of peoples' lists, but I've never really struggled with regular expressions, so once I'd absorbed the lessons on optimisation and testing in different engine types, I found I wasn't going back to it too often. I recently re-read the latest edition and didn't feel I'd forgotten much. I may be weird in this way, though — I enjoy regular expression munging and use it a fair bit, so it stays fresh in my brain.

The other two books are of a much more practical, professional nature. As I worked in different organisations (or even the same organisation with revolving reporting charts), I needed to work a lot more on my pragmatism. I didn't (still don't to some extent) handle bad working conditions well when I'm trying to produce technical product, or manage other people to do the same. So this was an area I needed to put a lot of learning into over the past five years. These books would have been useless to a ten-year-younger version of me, but came along at the right time when I needed them.

Finally, some books that, whilst not indespensible, have been a great inspiration for learning more and thinking in different ways about my areas of expertise:

  • The Deadline (de Marco)
  • Game Programming Gems, Graphics Programming Gems
  • Mathematical Writing (Knuth)

De Marco's book is a great presentation — via fiction — about why project management is hard in the real world. Knuth's book on writing is special because it covers specifically technical writing about theoretical, logical work and focuses on presentation and differing approaches. Although about mathematics (obviously), which was how I first discovered it, a lot of the lessons transfer well to theoretical computer science presentations as well. Maybe not useful to the intensely practical programmer, but more than once I've had to prove that a program or approach worked and document that. The ... Programming Gems books are just a good source of short algorithm fragments and can make learning fun. If you can't have fun in this industry, you're just not reading the right books.

Syndicated 2007-03-26 21:29:38 from Malcolm Tredinnick

Django Tips: Variable Choice Lists

Been a while since I added to this series. I've come across a couple of repeated questions lately, so it's time to give back to the knowledge pool again.

This time: using iterators to customise the options presented via the choices attribute on a model field.

Background

Before launching into the solution, let's consider the problem we are trying to solve. If you have a model field that is intended to hold only one of a number of limited values, Django provides the choice attribute. You can use it like so:

  class Document(models.Model):
    CHOICES = [(0, 'private'), (1, 'public')]
    ...
    status = models.IntegerField(choices=CHOICES)

When you use this in a form, only the two choices private and public will be presented and the database will store either 0 or 1, depending on the choice you made.

Aside: People often forget that when you retrieve such a model from the database, although the status field contains 0 or 1, you can get back the string version of the choice using the get_status_display() method of the model. Replace status with the name of the field for your own use. This is explained under get_FOO_display() in the Django documentation.

When Isn't This Enough?

There are two cases where the previous example falls a bit short.

The first case is when the list of choices is being updated regularly via changes to the database, or in some other way. In this situation, choices isn't the right approach to the problem. You are really talking about a dynamic relation to another data set. So model it that way: use a ForeignKey field to a table containing the list of choices and the values to store.

The second case is more subtle. Suppose you have a document presentation system. Documents on the production site are either public or private (more or less, this is the above example). However, the same code runs on a staging system as well, where documents are initially uploaded, reviewed and edited. On this system, the choices can include something along the lines of "ready for review" and "needs editing". This is a slight variation on similar systems I've implemented for a couple of clients recently, so it's not too unrealistic (although I've simplified a bunch of details).

In the second scenario, above, the list of choices is essentially static. So we are justified in using the choices attribute. However, the intiial values vary depending upon the system type — which we might reasonably control using a settings variable.

Now, it's generally a good idea to avoid referring to settings.* in the definition of fields and methods in Django. This way you can safely import the code without needing to have configured the settings module, which usually feels like neater code organisation (import everything, then configure, if you're using manual configuration). To my eye, using settings.FOO in declarations also looks a litle awkward (intuitively, it feels like a leaky abstraction, since we're delving into the depths of a module at the top-level).

For whatever reasons, whether you agree with me or not, I'm going to avoid using settings in my field declaration. Instead, I'm going to use a little-known (and not usually required) feature of the choices attribute: you can pass it a Python iterator instead of a sequence. So I can rewrite my example as follows:

  from django.db import models
from django.conf import settings

def status_choices():
    choice_list = [
            ('private', 'private'),
            ('public', 'public')]
    if hasattr(settings, 'STAGING') and settings.STAGING:
        choice_list.extend([
                ('review', 'ready for review'),
                ('edit', 'needs editing')])
    for choice in choice_list:
        yield choice

class Document(models.Model):
    ...
    status = models.CharField(maxlength=10, choices=status_choices())

You can see here that all the dependency on settings is inside the iterator function. So it isn't evaluated until Django needs to actually display the choices, which should be long after configuration has taken place. This relies partly on the fact that the Python compiler knows this is a generator function (because of the keyword yield) and consequently executes none of the code until the first value is retreived from the generator.

I would also draw attention to a couple of other implementation decisions I made in this code:

  1. The extra options only appear if the (optional) settings.STAGING setting is set to True. Note that this "fails safe", in the sense that if you forget to include the STAGING setting, it won't inadvertently expose the extra options and documents to the wider public. I made the setting optional, because I'm just a nice guy, and so had to first check that it existed using hasattr() before I tried to access it. You may or may not wish to be that flexible.
  2. I switched from storing integers, as in my first example, to storing short, readable strings in the database. I prefer this method, because it avoids the problems associated with having magic numbers in the database column. If you see the number '2' in the database, what does it mean? If you see the string 'review', things are a little more mnemonic. I've noticed a tendency for people to use integer values with the choices attribute; perhaps they are forgetting it works on pretty much any field and CharField fields are often a good choice?

Cavaet

If you are very familiar with Django, or tried to experiment a little with this example, you'll realise I have not told the entire truth here. The whole argument about using an iterator to avoid accessing the settings module too early is pointless. You cannot currently import django.db.models without configuring the settings module, so there's a chicken-and-egg problem there. However, I consider that to be a (very small) bug in Django and it's something I want to fix in the near future. You should be able to import modules without having done any configuration.

You probably won't need to use this technique very often at all. Every now and again, though, you will run across a configuration where being able to construct an intelligent choices list will help the code layout flow more smoothly.

Syndicated 2007-03-26 14:22:51 from Malcolm Tredinnick

After a long hiatus, I've started blogging again. Because I want to try out a few different things and not all of it is Open Source related, I've moved to my own site. Henceforth, all the real non-events will be over at the pointy stick.

98 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!