Older blog entries for titus (starting at number 171)

Michigan State University, Software Engineering, and Python

Just got back from MSU, where I gave two talks, one on my computational work and one on my biology research. (I'm applying for a joint CS/biology position there.)

One of the topics that came up frequently was whether or not I was interested in (or capable of ;) teaching CS undergrad courses, given that I have little formal CS training on my resume. My recent activities in agile testing sparked some interest, as did the notion of teaching a software engineering course based on agile methodologies. I also mentioned Greg Wilson's Software Carpentry course as a possible cross-over course for computational scientists.

I also proposed using the vast base of available OSS software as a starting point for an advanced software engineering course. The idea would be to demonstrate problems and solutions on an already-available hunk o' code; things like setting up (or extending) testing, stabilizing APIs, etc. It could actually be combined with a survey course on different languages. Hmm.

Interestingly, people in the department were already investigating the idea of switching to a scripting language -- Python was explicitly named -- for part of an intro-level programming course. (One professor mentioned ALICE, too.) Needless to say I'd be pretty enthusiastic about the opportunity to introduce Python at that level.

I also spent some time proselytizing about agile development techniques to various friends. Yep, I drunk the cool-aid, it seems.


From lesscode:

" The only architecture that matters is the simplest one you can get to solve the problem at hand. "


DNS & mailman, oh my

Spent several hours today wrestling with DNS and mailman. The goal was to consolidate my DNS onto one machine which would then serve to my hosting company's name servers; this has been years in coming, and I finally had all my pins lined up. All went well, except for a weird glitch in the public-facing name servers which ended up disagreeing on some domains. Boiiiinnggg went some mail...

Then it was mailman's turn. I decided to use Debian's mailman install, which worked fine except for some of the standardized yet esoteric places they place config files. /etc/mailman/mm_cfg.py, anyone?

Next up: exim4 and virtual domain hosting.

I'm getting too old for this sh*t.


John, of JohnCompanies.com (my hosting provider), is starting a new service; I betcha can guess what it does from the name ;). Go check it out.


More buildbot

In addition to our buildbot automation hacks, we spent a fair amount of time twiddling our buildbot configuration for the PyCon project. I wrote up some of the configuration file stuff last night. If you're interested in a private "force build" status page, locking master & slave resources, driving builds from svn checkin, or using the @reboot crontab extension to start buildbot on boot, you might be interested in reading it.

More dnspython

I sent the little dns_check module on to Bob Halley, the author of dnspython, and he sent me back a couple of patches for the code. Good stuff. All checked-in and documented now.

Sysadmin tools

This slashdot article infuriated me more than the usual half-troll slashdot article. An MP3 player and a terminal program are sysadmin tools? Bah.

Here are a few of my favorites.

  1. screen -- run multiple programs in a single terminal window, flip between the output, detach and re-attach. Life is good. My all-time favorite; I've been using it ever since Mark Galassi introduced me to it back in the late 80s/early 90s.

  2. VNC -- like screen, but for X11 programs. Bonus: VNC to Flash recorder for screencasting, although that's not really a sysadmin application.

  3. bash 'for' loops at the command line. Silly, I know, but combined with 'cut', 'sort', 'uniq', 'tail', and 'head', I can do tons of things in one long complicated line. I don't actually know how to program in bash beyond this -- I use Python for anything more complicated.

  4. 'find'. Its command line options have gotta be nearly Turing complete. Powerful beyond belief.

  5. supervisor. It's hard to explain how happy I was when I found this Python-based system for starting and restarting persistent processes...

  6. twill. Really. Having a command-line tool to script Web site aliveness tests & (now) DNS checks is pretty handy. I'll probably add some ping-is-the-machine-alive code, too.

I'm sure there's more that I'll remember as soon as I post this.


26 Mar 2006 (updated 26 Mar 2006 at 20:14 UTC) »
Using 'supervisor' to keep your Web sites (and other processes) running

I've been using supervisor to manage some of my Web stuff -- especially my Trac/SCGI sites. I wrote up an intro article on setting it up here. Comments welcome.

Using twill for more than Web testing

Grig recently pointed me towards systir, a DSL for doing system-level testing. It looks oddly like twill code (which in turn looks very similar to PBP, which looks like WWW::Mechanize::Shell).

Anyway, based on this similarity to systir, Grig suggested that twill could be used as an all-around acceptance testing framework, much like systir. (See my last post for a specific example of extending twill with arbitrary python code.) Interesting idea... I'd been thinking about writing some extensions to deal with DNS monitoring (e.g. "does this DNS server report that there is an A entry for this host pointing to this IP?"), and so I took the bit in my teeth, so to speak, and went ahead with it.

Ta-da! A mere 30 minutes later, the 'dns_check' extension module for twill was born.

Below is the output from a test script for this module. 'dns_check' uses the fantastic dnspython Python library to allow simple assertions about DNS entries; stuff like 'make sure these two names resolve to the same address' is pretty easy with it. (Note that I turned on command debugging output so that commands could be seen as they were executed; normally these commands are silent.)

% ./twill-sh test_dns.twill

>> EXECUTING FILE test_dns.twill twill: executing cmd 'extend_with dns_check' Imported extension module 'dns_check'.


Extension functions to help query/assert name service information.


* dns_resolves -- assert that a host resolves to a specific IP address. * dns_a -- assert that a host directly resolves to a specific IP address * dns_cname -- assert that a host is an alias for another hostname. * dnx_mx -- assert that a given host is a mail exchanger for the given name. * dns_ns -- assert that a given hostname is a name server for the given name.

twill: executing cmd 'dns_resolves amazon.com # amazon.com ==> any of 3' twill: executing cmd 'dns_resolves amazon.com' twill: executing cmd 'dns_resolves amazon.com' twill: executing cmd 'dns_resolves idyll.org.' twill: executing cmd 'dns_resolves idyll.org. joiry.net. # same IP addr?'

twill: executing cmd 'dns_mx idyll.org. mail.idyll.org. # '.'s are handled' twill: executing cmd 'dns_mx idyll.org mail.idyll.org.' twill: executing cmd 'dns_mx idyll.org. mail.idyll.org'

twill: executing cmd 'dns_a amazon.com # explicit 'A' records.' twill: executing cmd 'dns_a amazon.com' twill: executing cmd 'dns_a amazon.com'

twill: executing cmd 'dns_cname www.idyll.org. idyll.org. # explicit 'CNAME' records.'

twill: executing cmd 'dns_ns idyll.org nsa.idyll.org' -- 1 of 1 files SUCCEEDED.

The script itself is pretty simple, of course:

debug commands 1 extend_with dns_check

dns_resolves amazon.com # amazon.com ==> any of 3 dns_resolves amazon.com dns_resolves amazon.com

dns_resolves idyll.org. dns_resolves idyll.org. joiry.net. # same IP addr?

dns_mx idyll.org. mail.idyll.org. # '.'s are handled dns_mx idyll.org mail.idyll.org. dns_mx idyll.org. mail.idyll.org

dns_a amazon.com # explicit 'A' records. dns_a amazon.com dns_a amazon.com

dns_cname www.idyll.org. idyll.org. # explicit 'CNAME' records.

dns_ns idyll.org nsa.idyll.org

All in all a very short & satisfying bit of hacking. I'm going to send the underlying code to Bob Halley, the author of dnspython, to see what I did wrong ;). If anyone has further suggestions re the functionality, please drop me a line. You can grab the very latest twill version for yourself here if you want to play; the 'dns_check' module is in twill/extensions/dns_check.py.

Incidentally, the very latest twill code also contains some nice additions to the shell for dealing with extension modules. Specifically, TAB completion now works for extension module commands, as does the 'help' command. Huzzah!



Linux; Apache; Most of our cool scripting languages start with P; and PostgreSQL. Yep, LAMP ;). (ref)

Miscellaneous twill

The next release of twill is taking a bit longer than I'd planned. I think this is mainly because of the increase in the number of people using it -- they keep finding bugs, damn them! However, bug fixing is proceeding apace, and there have been one or two amusing incidents along the way. At this rate I think it's fair to say that most of the minor burrs in twill will be ironed out by the time that the beta, 0.9, arrives. I've come to the realization that a "1.0" release should be predicated on as much actual

use of the software as possible -- that way you really do get not only a well-tested piece of software, but one that genuinely meets a variety of needs.

On that front, Grig pointed me towards a challenge on the agile-testing list. Brian Mairick presented a Watir solution, and it seemed incumbent upon me to immediately drop all other work and whip out an example using twill. A solution in twill form is here:

extend_with table_picker
go http://issola.caltech.edu/~t/transfer/jjj.html

showforms check_cells "name1" 0 showforms

Of course, this merely punts the solution into the 'table_picker' extension module, which follows at the bottom.

A few quick notes about this solution:

  • You'll need the latest version of twill, 0.8.4a11, either in egg version from my alpha dist directory or from the .tar.gz resting home.

  • With that, Python, and setuptools, you're set; no other software is needed.

  • Both the Watir solution and the twill solution rely on using the Real Programming Language (be it Ruby or Python) to do the logic processing necessary to find the right cell(s).

  • My solution could be shorter, but it works fine as-is ;).

  • The real advantage to twill in this particular situation is that it's completely and trivially automatable. Of course, it also doesn't handle JavaScript...



import re
from BeautifulSoup import BeautifulSoup

def check_cells(label_regexp, label_column): """ >> check_cells <label_regexp> <label_column>

Check all checkboxes in rows where column #'label_column' matches 'label_regexp'. """ # deal with input parameters label_regexp = re.compile(label_regexp) label_column = int(label_column)

# first, get the browser obj from twill import get_browser, commands b = get_browser()

# grab the HTML from the current page html = b.get_html()

# parse the HTML. soup = BeautifulSoup() soup.feed(html)

# grab the table table = soup.first('table')

# grab the rows & iterate rows = table('tr')

names = [] for i, row in enumerate(rows): cols = row('td') # get the label column col = cols[label_column]

# if it's a match, run through & record all checkboxes if label_regexp.search(str(col)): for col in cols: for element in col('input'): if element.get('type') == 'checkbox': names.append(element['name'])

for name in names: name = str(name) commands.formvalue('1', name, '1')

MSU East Lansing

Looks like I'll be in Michigan in early April, giving a talk or two at MSU. They'll be science talks, but at least one talk could be of interest to people likely to be reading this blog. Drop me a line if you're interested in coming.

It's not about the tools, dummy.

While reading A Conversation with Victoria Livschitz, I started to feel very unhappy with her answers. Then Grig sent me this nightmare, too. And I realized that I'm very confused about something.

As you may know, I'm a fairly recent convert to "agile testing". Not XP or test-driven development, per se; for me, "agile testing" just connotes applying many small tests to all levels of your app to test it holistically, combined with automated continuous integration to make sure that stuff stays working. I'm into "agile development", too, although more by accident than by design. I've been a huge fan of rapid prototyping since the first year I started programming -- I was producing crappy but functional TCP/IP code within 3 mo of access to a UNIX system, back in 1990. I've also been using revision control since then, too: I started with RCS, learned to use CVS in the late nineties, and now split my time between Subversion and Darcs. This is how I work, nowadays: I write some code, get it working, test it at some or many levels, write some more tests, check it in, repeat.

From reading these two articles -- Livschitz and the Nightmare article -- I conclude that many people do not develop the way I do. Moreover, people seem to be focusing on tools -- language features and revision control tools, in this case -- rather than on process. I don't understand why.

What am I missing? Do people truly, really believe that the computer is ever going to do a better job of ascertaining or interpreting their intentions than they themselves are doing? It seems like this is what the fans of static typing, new & better revision control systems, and UML all believe. I just don't grok how specifying behavior in ever more ontologically controlled ways is going to get you past the central social issues of software engineering, which lie in first figuring out what your software should do, and then communicating those intentions to other developers. These are social, not technological, issues, and they cannot be solved technologically. We can be aided by specific tools - personally, I find that using a fairly high level interpreted language helps with many of my problems -- but I was rapid prototyping in C long before I found out about Tcl, Perl, and Python. I struggled with RCS revision control in the early '90s, but I never thought that the reason I was developing bad software was that I had to lock individual files and manually merge contributions. I wrote tests before I had unit test frameworks. But I never confused the technological limitations of my tools with my failures of process, and I never stopped trying to learn better processes.

It is perplexing to me that such simple statements as "write code that incrementally approaches your goals" and "make your best effort to test that code" -- the two cornerstones of Agile Development and Agile Testing -- are apparently so difficult to comprehend or effect for so many people.

My best guess (and this is where my lack of experience may handicap me) is that people like Livschitz are trying to solve communication problems inherent in large team projects. It is necessary to work with multiple people to accomplish big things, and that's scarily difficult -- big game development scares the willikins out of me for that reason -- and I can imagine projects where it's impossible for any one person to grasp more than the basic elements of the overall design. It may well be that people are trying to substitute for proper testing with static typing and interface enforcement, and the purpose may be to use these tools to communicate between developers and teams across time and space. If this is the case, then I think they're being foolish. Unit, functional, and acceptance tests work just fine for communication at all levels of your application.

This general problem may be what Jon Udell is pointing out in his argument against standards, too. Don't try to communicate outside the code: make the functioning code itself your exemplar.


P.S. I am in no way blaming the Nightmare author for his problems. (It's clear he's never actually used subversion, which lets you solve merge issues in your own repository prior to check-in.) I am questioning the way his company develops, however.


A cool post on combining KCacheGrind with cProfile. This is something I need to look into...

Web testing in other languages

On the off chance that you want to actually develop Web sites in a non-Ruby/non-Python environment, it's useful to know that you can also test them in the same languages. (I do think twill is a perfectly fine way to test non-Python Web sites; more on that anon.) Here's a PHP framework called Web Tester, and it looks like Perl's WWW::Mechanize::Shell is very similar to twill.

One entertaining thing is how enthusiastic people get about Web testing when they discover things like Web Tester (and, dare I say, twill ;). (I wuz there myself, back when.) It's really a moment of enlightenment for people when they realize that it's in fact relatively easy to write functional Web tests.

NSA Eavesdropping

I sent Bruce Schneier's article on false positives/false negatives on to several Bush supporter friends of mine. (Such people are rare at Caltech, admittedly!) One friend responded with an argument that (as far as I could tell) boiled down to (a) we don't have any better way to look for terrorist plots, and (b) we're all doomed anyway. I guess that's a variant of the Christian fatalist attitude that "Armageddon will be here soon anyway, so who cares what we do"? Not sure. Very odd.

Buying books online

Between Baen and Fictionwise, there are some good options out there for the kind of trashy books I like to read...

Potential twill coolness

One the of most rewarding things about working on a medium-term project like twill is the ability to periodically indulge in a nifty hack that people might actually use.

Here are two such minor hacks.

First, last night I whipped a quick link checker into shape. With the latest (0.8.4a5) version of twill, you can do this:

>> go http://www.idyll.org/
>> extend_with check_links
>> check_links
Gathered URL http://pywx.idyll.org.
Gathered URL http://alife7.alife.org/.
Gathered URL http://www.wrightflyer.org/.
Gathered URL http://www.idyll.org/~t/.
Trying http://www.wrightflyer.org/ ==> at http://www.wrightflyer.org/ ...success!
Trying http://www.idyll.org/~t/ ==> at http://www.idyll.org/~t/ ...success!
Trying http://pywx.idyll.org ==> at http://pywx.idyll.org ...success!
Trying http://alife7.alife.org/ ==> at http://alife7.alife.org/ ...success!

Particularly nifty things about this link checker:

  • since the link checker is built on top of twill, you can navigate to pages, log in, get/set cookies, set auth headers, etc. before link checking.

  • likewise, because the links are actually followed, referrer headers are set correctly.

  • as an extension, it serves as both a simple example of how to write more complex link checkers, and it doesn't get in the way of normal twill use.

(I plan to also add a little site crawler along the same lines as the link checker. Stay tuned.)

Second hack: tracebacks now give you the line number in twill scripts. Suppose you have an error (in this case, a contrived error asserting that a page failed with 'code 400') in a script. There are other ways of figuring what line it's at, but here's one that fits right into your traceback:

  File "/usr/local/lib/python2.3/site-packages/twill-0.8.4a5-py2.3.egg/EGG-INFO/scripts/twill-sh", line 20, in ?
  File "/usr/local/lib/python2.3/site-packages/twill-0.8.4a5-py2.3.egg/twill/shell.py", line 248, in main
    execute_file(filename, initial_url=options.url)
  File "/usr/local/lib/python2.3/site-packages/twill-0.8.4a5-py2.3.egg/twill/parse.py", line 155, in execute_file
    _execute_script(inp, **kw)
  File "/usr/local/lib/python2.3/site-packages/twill-0.8.4a5-py2.3.egg/twill/parse.py", line 196, in _execute_script
    execute_command(cmd, args, globals_dict, locals_dict, cmdinfo)
  File "/usr/local/lib/python2.3/site-packages/twill-0.8.4a5-py2.3.egg/twill/parse.py", line 106, in execute_command
    result = eval(codeobj, globals_dict, locals_dict)
  File "tst3:2", line 0, in ?
  File "/usr/local/lib/python2.3/site-packages/twill-0.8.4a5-py2.3.egg/twill/commands.py", line 119, in code
    raise TwillAssertionError("code is %s != %s" % (browser.get_code(),
twill.errors.TwillAssertionError: code is 200 != 400

Yep, the bold line is the actual line in the script file that I ran. Heh.

(I actually whipped this together for some unit tests where only the traceback was being printed. Very handy in that particular instance.)

One very amusing side effect is that coverage analysis records that line in that file, despite the fact that it's not actually Python code. Heh**2.

Automating your tests to run in buildbot

I spent many hours writing up a fairly shallow summary of how we automated the heck out of every damn type of test we used for our PyCon tutorial. (Grig even claims that

it's entertaining. OK, so I am your monkey. ;)

This is the thing to read if you want to know how we automated Selenium tests, FitNesse tests, twill tests, texttest tests, and egg build/install tests. It's structured as a narrative with explicit links to source and config files.

Let me say that I am immensely proud of this stuff and I think it could be very valuable to people setting up any kind of continuous integration framework. It took us many moons to work out the details of how to get this stuff to actually run automatically in buildbot, and it wasn't easy. But it all works!

Also, please comment on it.


Two great ways to waste^H^H^H^H^Hspend your time

Uncle Bob's blog on testing & other stuff: link.

Stevey's Drunken Blog Rants: link.

Python ABI complaints

Only got one reply to my Python binary interface post -- thanks, zooko ;). Basically, "distribute your own Python with your app" is his answer. This is what Java program distributors often do to control the precise version of Java they're using, so at least Python is in good company...

Tomcat/JSP and httplib.py suckiness

I ran into two extreme examples of suckiness tonight, while trying to debug (what I thought were) related issues. (They weren't.)

First, it turns out that JSP cannot deal with multipart/form-data POSTs natively. Even worse, it just fails, giving you null values for form components. This violates the rule "fail in an obvious manner" and it also violates the rule "deal with data properly". What am I missing here? It must be something: this is 2006, folks.

Second, httplib lets you pass in HTTP headers with trailing newlines. Not such a problem, until you forget to put that strip() on the end of your base64-encoded HTTP AUTH header, and then all of your POSTs break because the Tomcat server on the other side of the connection ... neither handles nor complains accurately about the bad HTTP headers. Seriously, folks, why does base64.encodestring append a newline to the output?!?!? And why does httplib let clearly non-spec headers through?


The Rumors of Quixote's Death are Greatly Exaggerated

...and in 3 iterations, it will be Quixote.

But seriously, Mike Orr sparked an interesting discussion on the Quixote mailing list. Mike detailed a number of ideas for "modernizing" Quixote, and I'm strongly in favor of many of them. (Alas, I'm bogged down with other stuff (as always) and will be limiting myself to maintaining Quixote/WSGI and Quixote/sessions2 stuff.) Anyway, the thread is worth reading.

The main point that came out of the "Is quixote dead?" thread is this: Quixote works very, very well for a certain class of people, and it really is one of the few Web frameworks addressing that niche. Unfortunately the niche -- bare-bones object publishing -- is more utilitarian than sexy, so I don't think that Quixote will ever be a good candidate for the Bikini-Clad Python Web Framework Of the Month, but for the same reason I don't think it will be going away. Reassuring -- not that I have any problem supporting Quixote sites myself, but it's always nice to engage with other people on these things, too.

An amusing side that came out of the thread: at least one version of Quixote contains exactly 42 source files. Come on, folks, it's clearly the Answer!!


SQL testing

Some interesting links from Neil Conway on random generation of SQL queries.


With Andy Wingo's permission, I've plopped down an updated version of statprof, his statistical profiler for Python, here. You can check it out of Subversion thus:

svn co http://svn.idyll.org/repos/vallista/software/statprof/trunk statprof

This update to statprof makes it Python-2.3 compatible and adds a few options to the output.

SCGI and Quixote software

I've put scgiserver.py (a SCGI-to-WSGI server-side adapter) and session2 (a nicer persistent sessions module for Quixote) up on the same site.

Further consolidation: Cartwheel and FamilyJewels projects

I've also moved my two big bioinformatics projects over to my new dev server, at cartwheel.idyll.org. This way I can sync releases of different packages more easily; plus, I can use Trac. Huzzah!

Punk-ass bitching about Python's ABI

OK, so these people are obnoxious and full of themselves. But are they right?

To quote,

Python is unusable for software that cares about distribution, beyond very simple uses. The libpython ABI is not only horrifically unstable (every minor release changes it), but it also varies between distributions thanks to things like the brain-dead unicode ABI changing according to how it's configured. EG Python upstream uses UCS2 but Fedora uses UCS4. That means no using Python for application scripting and no shipping mixed C/Python programs.

If so, is there any official cognizance of these issues? Mostly I'm just curious: I don't think I've seen this much vitriol directed at Python in a while ;).


OO Types

An interesting series of lectures on OO types. (via LtU)

Buy me

Anyone know if this book, "Beyond Software Architecture", is any good? It looks interesting.

Writing articles, publishing screencasts, and breaking even.

Both Grig and I have a ton of articles and screencasts that we'd like to ginny up, based on various bits of our tutorial efforts. I've also been contemplating a series of articles on "twill and ... [ insert language of choice here ]". Now, it would be nice to start recouping the cost of our Web hosting, especially since we're not promoting a company or pimping a T-shirt line.

I have friendly contacts at O'Reilly and IBM DeveloperWorks, both of whom publish articles on related issues and presumably pay for them. I/we could also go the Google Ads route, and/or the Google Video route (for screencasts, which are likely to eat up bandwidth). Thoughts? Comments? I think we'd like to retain the articles on our own sites, too, although I'd be amenable to a various contractual limitations on that front.

(Obviously, if you're reading this and you're a publisher of online developer articles, get in touch ;).)

More PyCon notes.

Yeah, it's been over for 5 days... but I wanted to write about some other lightning talks I saw (in the 2nd lightning talk session), and also jot down some ideas that I'd had while at the meeting.

Lightning talks:

  • Martin Blais gave a short lightning talk on rst.el, offering reStructuredText support for emacs. Good stuff.

  • Someone -- didn't catch the name, or really anything beyond the name of the software -- presented software called "Alchemist", which looked like a great GUI for examining sales information across the US. Very impressive, very simple.

  • Ian Bicking presented on something called SQL-API, which looks like a very nice SQL library. He billed it as the next generation of the DB API, and pointed out that writing database APIs was boring and we should really get on with writing interesting code instead. It sounded pretty nice; I will almost certainly start using it myself, if it turns out to be EITHER well documented OR functional. ;)

  • Chad Whitacre's talk on testosterone and httpy was cute, although I found the "30 second wiki" demo unconvincing because he pasted in about 10 lines of source code ;).

(I missed a bunch of the lightning talks because I was chatting with someone about BioPython in the hallway -- my apologies to the speakers.)

Random Ideas

In the hopes that someone else gets there before me, here are some things I'd like to implement.

  • driving twill from doctests.

  • modifying nose to parse out doctests into individual tests.

  • syspath mangler class for use in saving, modifying, and restoring past sys.paths.

  • WSGI recorder/playback mechanism.

  • WSGI proxy (prolly based on TCPWatch code).

  • A simple Web crawler based on twill.

If someone has gotten there before me on any of these, please

drop me a line & I'll plug your code to the high heavens[0]. thanks ;).


[0] and I'll probably fold, spindle, and mutilate your code, too.

162 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!