Older blog entries for titus (starting at number 33)

Fun distutils factoid of the day:

python setup.py clean
doesn't remove your build/lib.* directories, so C++ extensions don't get recompiled. You have to do a
python setup.py clean --all
to force recompilation.

I think this is a documentation bug, since --help-commands says clean - clean up output of 'build' command rather than clean - clean up temp files from 'build' command.

As long as I'm complaining, why does

python setup.py --help
return so little useful information for package installers? You have to run
python setup.py --help-commands
to get a list of actual commands! This is a fine example of behavior built around programmers rather than users, I think ;).

I ran across the 'clean' issue because I have some C++ extension files that depend on a C++ library. I don't know how to make my setup.py care about the modification date of that library file, so my extension files are perenially out of date with respect to my actual library code.

The help-commands issue is something I run across every time I try to understand the distutils command line options.

...but enough whining. Here's something useful, instead ;). I ran across this cool OS X software today: my friend Nathan blogged about appscript, which together with Platypus make it easy to build & release simple Python apps for OS X. Very neat!

Two gems from The New Yorker:

Regarding Crichton's new book on the climate change "conspiracy":

What "State of Fear" demonstrates is how hard it is to construct a narrative that would actually justify current American policy. In this way, albeit unintentionally, Crichton has written a book that deserves to be taken seriously.

Regarding Bernie Kerik (Homeland Security ex-nominee):

"Officials have gotten into trouble for sexual misconduct, abusing their authority, personal bankruptcy, failure to file documents, waste of public funds, receiving substantial unrecorded gifts, and association with organized crime figures. It is rare for anyone to be under fire on all seven of the above issues." (Henry Stern)

In other news, haruspex (accurately) characterizes me as "someone-who-doesn't-get-Perl-and-probably-never-will". To be fair, I *did* "get it" back in the mid-'90s... Musta been all those drugs I took in '99 that turned me off of it. I do like this quote from Larry Wall:

Perl isn't really about safety. It's about getting where you're going, and enjoying the trip. It's more important to be a good driver than to have seven feet of sponge rubber all around your car.

I do need to do some Perl work here and there, and the question I have for someone knowledgeable (haruspex?) is this: are there any good guidelines for designing an OO interface in Perl? I've browsed around on the 'net and while it seems possible to do pretty much anything, I don't use Perl enough to know which package(s) are help up as good examples of OO Perl. Any pointers would be much appreciated (& acknowledged)...


On Web testing

Grinder isn't on many of the lists of Web testing tools I've seen, but it seems to be quite mature & gets some good press. Let me know if you try it and like it.

Charlie Stross & Perl

One of my favorite new sci-fi authors is a guy named Charles Stross. He rivals Iain Banks for plots that are turned 2 degrees to normality, and is a hacker/sysadmin by trade. It was therefore distressing to read his take on Perl:

... then along comes Randal or Tom or one of the other Perl Gods, and they deliver a half-line-long command that resembles line noise, is three times as efficient as the other solutions, and leaves you scratching your head.

Apparently this is a desirable feature of the language!?

Anyway, I have to admit he wrote the best description of Zope I've ever seen...


p.s. The Atrocity Archives is an amusing blend of a Cthulhu-like mythos and UNIX sysadminning. Let's just say that LARTing takes on a whole new meaning...

11 Jan 2005 (updated 11 Jan 2005 at 08:49 UTC) »
paircomp 0.9 (rc)

hooray. docs, tar.gz. It's only a small & simple comparative sequence analysis library for DNA, but it's been broken for about a month. (Asinine data structure, refactored yo' ass.)

Summary: complete reimplementation, now with regression testing. C++ library completely rewritten using the STL. 95% of the code is now tested via the Python API in one big mongo test script.

One more brick in the wall...

Ryan Tomayko comments on GPL vs the Python community. My work-related libraries are LGPLed, and my GUIs and Web interfaces are GPLed. Why? I'm an academic programmer, and my code is owned by Caltech. Neither I nor Caltech depend on income from these programs. However, I do intend to take them from job to job, and the GPL protects that. The L/GPL also protects Caltech. win/win.


9 Jan 2005 (updated 9 Jan 2005 at 18:38 UTC) »
Testing is addictive

After my many travails with PBP/maxq/sgmllib/HTMLParser/htmllib I finally sat down to work on my actual Web application, Cartwheel.

Cartwheel is a bioinformatics system that lets biologists upload sequences, analyze them, and export their analyses to a GUI, FamilyRelations II. Cartwheel itself is entirely written in Python, and FRII is written in C++ using FLTK -- a fine combination so far. I use a simple XML-RPC API to export the data & most of the internal communication between Cartwheel components is done in PostgreSQL. The system as a whole has been used by a few hundred people to do bioinformatics work and in general it's fairly robust. It's been around for several years, and I'm pretty much the only steady developer.

Normally I test Cartwheel's Web interface by roaming around in it with a Web browser and paying special attention to things I've changed. Until recently, I had no automated way to test it, and so in general I've been assuming it's mostly ok if no users yell at me after I post an update. (Known as the "Microsoft test method"... ;)

The mass of little bug reports recently reached a critical point, and so I started to fix them today. One of the problems had to do with some naively implemented search code in the Web interface, and so I set out to define the problem by changing the names of a bunch of the form variables. (This also fixed the bug, which tells you something about the code...) I had to edit files all over the place & quickly lost track of what code still needed to be patched.

So, I backed out all of my changes & used maxq to record a Web session that ran through all of the places where the search functionality was used. I saved the resulting PBP scripts and broke them into setup, test, and teardown scripts. I then went through the code base, made my changes, and re-ran the tests and fixed bugs caused by oversight until the tests all succeeded.

Another cool thing is that with the scripts separated into setup, tests, and teardown categories, I can also test my database export and import code quite easily:


export-db clear-db import-db

export-db clear-db import-db


With only a few assumptions about what the setup script does to the DB (basically, that it's complete), this will tell me if my import/export scripts are catching everything.

Overall I probably spent about 3x the amount of time necessary to fix the bug on generating the tests in this manner, but now that I have a (simple) framework set up to do it, it should go faster... One thing is for sure: writing PBP tests without maxq would be painful!

A while back I asked about other Web testing tools, and John J. Lee recently responded with this link: http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html. The Zope link is buggered, but overall I get the impression that there simply aren't many general Web testing tools for Python.

A few other Web testing link collections are on java-source.net and c2.com. JJL also pointed me towards opensourcetesting.org.

I'm interested in finding out about others, please let me know if you find any.


8 Jan 2005 (updated 9 Jan 2005 at 02:19 UTC) »
"Batteries included" may be a bad term

After some of the recent types kerfuffle, I decided I didn't like the "batteries included" description of Python. Let me explain.

Python-the-language is great, and fits most of my needs. Heck, it's done that since before 2.0 came out. It's always nice to discover that some nifty new feature (like list comprehensions) has been added, and often I can take advantage of them to write tighter/neater/more understandable code. Nonetheless, I'm fundamentally a systems & application programmer, and I don't generally write language frameworks, toolkits, etc., so these new features aren't usually all that critical to my work.

What is critical is the fantastic standard library that comes with Python. It has been steadily expanded in the years that I've been a Python programmer, to the point that when I'm looking around on the 'net for some functionality it's already in the Python lib more often than not. This saves me a lot of time & trouble installing new products. (I've also said that I think that the state of the included Python code, as well as the quality of the documentation & example code, plays a big part in educating new Pythonistas about acceptable coding practices with Python. The Python cookbook is a particularly nice collection of code, too. These both play a big role in the success of Python, IMO.)

So, batteries are included, if you think of the library as the "batteries" -- something essential to function, yet not terribly interesting in its own right. Looking at that description of batteries, though, it sounds more like it fits my conception of the Python language, though. After all, the language that things are written in is essential, yet not always all that interesting; doing things in the real world is more interesting. For that you usually need to interact with users or protocols, and that means some sort of interface, and ... you end up with a library of code that enables a lot of standard interactions. In Python, that code is distributed with the interpreter, which enables a stunningly wide range of functionality in your basic Python installation.

In other words, I think the included library & modules are more interesting than the word "batteries" might imply ;).

There may be a deeper schism lurking behind the recent decorator & types controversy: language vs tools. It's very sexy to work on extending the language: there are plenty of deep problems to be tackled, much thought must be expended, and only really really smart thoughtful people can do it well. As Iwan van der Kleyn pointed out, though, there are other things to do that will extend Python's reach, power, and utility. It might be worth the core team having a look at those, if their goal is to advance Python as a whole. If, instead, the goal is to advance Python-the-language, then I think they're doing a fantastic job of it & should keep it up. Personally I'm more worried about the future of the library.

UPDATE: AMK posted about this months ago, and got a lot of responses. I think that's what got me started...

And, before the naysayers get up & yell at me for my multitude of sins, I do have some specific work and proposals in mind. It's hard to grasp the library as a whole, though. (It's easier to write a criticism than it is to do the work, too, that's for sure.) Watch This Space.

As for the long-term future of Python, Guido's first priority clearly needs to be to grow a beard.


Python's future -- language, library, or tools?

There's been quite some kerfuffle about Guido van Rossum's proposal to add optional static typing to Python. In particular, Iwan van der Kleyn pointed out that Python seems to be caught between language extenders and application programmers, and that at the moment the language extenders are winning. Iwan points out that for application programmers, there are several library or tool improvements that are probably as, or more, important than any language addition. While I don't agree with all of his feature/app requests, I do agree with the general idea that Python would benefit hugely from investment in something other than optional static typing. Or decorators. Or (insert feature of the week).

There was an ACM Queue article a while back (if someone has the link, I'd appreciate it!) that discussed the divide between language power users & tool power users. As I recall, the article described the split between developers who invest time in learning new and more powerful languages, and developers who invest time in learning IDEs and language-specific tools. What seems to be happening in Python is that GvR & the core team consist of very smart people who believe the language should be extended -- and are busy doing that. IDEs and tool extension is left to others & their efforts are patched into the library system where appropriate.

The real question about language extensions like optional type checking and decorators is how contaminatory they are. Istvan Albert believes that "optional" type checking will soon become effectively non-optional; while he mentions only the social (workplace) pressures, I'm more worried about whether or not the language will require it in places where I don't care to use it. If a library module uses type checking, will any code that uses that library module be required to adhere to the type behavior specified?

I can see three scenarios.

The first scenario is that any code that uses type-checked code will be required to adhere to the type spec. In the medium term, type checking will become mandatory throughout the codebase because people will start using it in the standard library.

The second scenario is that "duck typing" will magically solve the problem & allow untyped code to use typed code. This seems to me to be the best option, and I bet it's what GvR has in mind.

The third scenario is that there will simply be command-line arguments to Python (or internal configuration directives) that disable type checking completely, allowing schlubs like me to go about our day and ignore large portions of the language.

I'd be ok with either #2 or #3. #1 sounds like a disaster to me. Either way, I expect a large amount of energy to be expended on this set of features that, frankly, is not so important to me. (Maybe it should be, but I'm not betting on it... I like Bruce Eckel's take on static typing.)

Regardless, there's the question of whether the core team is focused too much on language development. Is the relentless drive for new, immensely powerful language features detracting from the stability and usability of the language?

It may be time to consider how to "fork" Python development so that it works more like Linux. Why not keep a 2.x branch going that focuses on long-term stability & library expansion, while building a 3.x branch with new language features but fairly complete backwards compatibility? In my traipses through the library it doesn't seem like the library really takes advantage of the new language features in 2.4, which (if true) in itself is a telling trend that suggests that language development and library development are orthogonal. Even if they aren't orthogonal, maybe they could be made more so, and this could be a worthwhile endeavor.

The balance between language stability and new features seems like it's tough to maintain. GvR et al. have done a fantastic job so far. Here's hoping they can keep doing a fantastic job!


4 Jan 2005 (updated 5 Jan 2005 at 00:41 UTC) »
OCaml and string theory -- how cool is that?!

Open-source vs closed-source Web dev

On our trip to Mammoth over New Year's, we had an interesting discussion about Web development frameworks, specifically open-source vs closed-source. One of the things that made it especially interesting was that apart from me & my wife (your standard OSS advocates) there was an MBA student & software developer as well as an engineer working in management at a construction company. So rather than discussing things with the usual group of my academic friends who buy into OSS by default, I had to actually come up with business arguments for OSS Web frameworks ;).

There were a two requirements that came up:

  • The engineer wanted accountability: if the company they hired to build a Web site or CMS defaulted or screwed up, he wanted to be able to sue them and/or recover their assets. Beyond that, he didn't really care whether or not they used OSS.

  • The MBA/programmer wanted better overall documentation. His experience with the Java-based ArsDigita CMS that RedHat bought was extremely frustrating because the documentation was essentially nonexistent. I and others have had similar experiences with Python CMSs.

Beyond these two considerations, I don't think either one had a problem with open-source vs closed source, which was nice to hear. In particular, there was no mention of OSS lacking features, security, or stability -- it seems like FUD on this account isn't reaching these two people.

My responses were basically this:

  • Re accountability, it seems to be difficult to recover assets no matter what. Large companies (e.g. Oracle and MS) that sell software tend to have EULA that nix accountability at this level. Consulting companies are either heavyweights that have lots of lawyers (e.g. IBM) or small businesses that have no money anyway.

    They both more-or-less agreed to this point & redefined their requirements to be that there was a company, with a reputation to manage, behind the software. When I pointed out that there were now many companies that were building on OSS, they shrugged & said that would be fine.

  • On the better documentation issue I agreed but pointed out that (a) a lot of closed-source software doesn't have great documentation either, and (b) most companies aren't going to care what documentation there is for their CMS if they have a consulting firm running it. Since it's kinda silly for a small non-IT firm to do more than manage their content, this point doesn't really matter for the end-user.

I added that one strong reason to go with an OSS Web CMS supported by multiple companies is company failure. A continual problem (exemplified by ArsDigita) is that companies go out of business & leave their customers high & dry. Maybe the base CMS is stable enough to work without modification, but any future customization or extension will be impossible with a closed CMS if the owners go out of business. With Zope or Plone, other companies (or even individual consultants) can pick up the pieces, if the company you had hired fails.

This seemed to resonate with both of our friends.

Anyway... 'nuff for now.


3 Jan 2005 (updated 3 Jan 2005 at 20:35 UTC) »
Is development of new software tools "moribund"?

I've always enjoyed Philip Greenspun's take on things; about 50% of what he says goes straight to the point. Unfortunately the other 50% seems to be complete crud. (I'm never sure how seriously he's taking himself, so it's hard to know if he's just trying to be thought-provoking. Even if not, separating out the gems from the crap is enjoyably difficult and hence worthwhile.)

His most recent weird post talked about a fairly simple Perl script to spam friends with invitations. In the comments, someone tweaked Philip about being a "Perl whore" despite his lauding of AOLserver, a Tcl-only (well, mostly) Web server. Philip responded that he had good reasons for liking AOLserver (connection pooling &c.), and since it happened to use Tcl, he used Tcl himself. Philip then felt compelled to say that he thought software tools were moribund, because his friends coding in .NET, his students working with Java, and people working with PHP weren't as productive as people who used AOLserver -- a technology designed over 10 years ago.

In the comments, I remarked that whinging about how Web software development tools haven't moved on since 1992 by citing Tcl, .NET, Java, and PHP was silly. Why not look at languages that support (and support well) a variety of techniques? Python (and by reputation Perl and Ruby) tools have come a loooong way in the last few years. In Python you can now choose between at least 5 different frameworks, all of them at least moderately mature. All of the popular Web programming techniques are represented between the various frameworks: the object publishing paradigm (Zope/Quixote), the object-kit templating paradigm (WebWare/CherryPy), and of course the utility belt that is Twisted.

The real question is, why not use Tcl, PHP, .NET, and Java for Web programming? It comes down to two conflicting considerations:

  1. the "each page is a class method" model encouraged by Java is overwrought for the simple pages that make up 20%-80% of any Web site (depending on the site, of course); it seems to be great for more complicated sites, though.

  2. the "each page is a string of spaghetti code" model encouraged by Tcl and PHP is great for the simple pages (20%-80% of any site) but disastrous for more complicated sites.

Or, to put it another way, straight scripting languages are great for constructing simple pages, while higher-level abstractions are needed for more complex pages. Most Web sites require both kinds of pages, but many languages do not support both kinds of programming well.

Python offers a great intermediate: it is a scripting language that deals well with outputting strings, but which offers nice higher-level ways of building out a framework. (This is likely true of other scripting languages that support object-oriented programming, such as Perl, [incr Tcl], Ruby, OCaml, ... what am I missing? VBA?)

Bas Scheffers pointed out (in the comments, again) that "it's not the tool, it's the programmer". Well, yes, this is the usual last resort of any language anti-advocate: heck, they're all identical at some level of abstraction, right? And you can write spaghetti code in any language, right? So how dare you say that language A is better than language B? Well, my experience tells me (and other people that you should trust more) that Python is better than both Tcl and Java for many things. And so the question is, why? I'm sure there are many considerations that make sense to people, but one that I haven't seen mentioned before is that of how examples are coded.

Since most programmers liberally "appropriate" code -- from books, from open-source programs, and especially from cookbooks and examples -- the quality of that code has a lot to do with the way they write programs. It also has a huge effect on the way they learn to code in the language. And I think this effect is grossly underestimated.

For Python, there's a nice tutorial, a variety of books (caveat: most of which I haven't read), and many, many examples from both the comp.lang.python newsgroup and the Python cookbook. By and large, the code contained in examples is clean, simple, short, and documented. It also lives at a level of abstraction that fits: classes/objects used when appropriate, and not used when not appropriate. Yes, there's plenty of gunk in Python, but it's not what you first encounter & it's easy to find pretty examples.

For other languages, example code is often much nastier. Spaghetti-code style may not be required by Tcl and PHP, but most of the example code I've seen can't be classed any other way. Overwrought object-oriented gunk may not be required by Java, but most of the example code I've seen certainly fits the description.

It could be that simple: if you write uncomplicated yet useful code examples, and people learn your language from them, then in the end your language will be used to write prettier code. And there will be less unmaintainable spaghetti code, or ugly overcomplex OOgunk, written in that language. And, if you're lucky & you've figured out how to grow your language over time with the help of the community, your language will continually move towards supporting that kind of coding.

Why Python examples are pretty is a different question, and the answer may be more sociological than technical. Perhaps it's as simple as Python fitting my brain better than other languages do, and it's not true for everyone else. Or, perhaps it's more than that -- for example, our beloved BDFL may be particularly good at designing a certain type of language.

In the end, supporting good mechanisms of abstraction may be necessary for good programming, but it is obviously not sufficient. It doesn't do much good to have a language that supports a bunch of mechanisms that don't cleanly fit into example code. Nor will it do the language any good in the long run if the example code is poorly constructed.

So, I don't think that the development of software tools is moribund -- but it may be time to move on from Tcl for doing Web programming.

Philip Greenspun's company died a horrible death several years ago partly because they were trying to transition a hideously complex mess o' Tcl into Java. I wonder what would have happened if they'd chosen to use something more scriptable than Java but a little more supportive of abstraction than Tcl?


p.s. I'm pretty supportive of using Java for other things, such as GUIs. I'm just not smart enough to make it do data reduction fast... so I switched to C++ / FLTK.

28 Dec 2004 (updated 28 Dec 2004 at 23:31 UTC) »

Odds and ends today...

PBP & SF hackage

SourceForge announce lists full of spam? Try this PBP script, with the rematch.py extension:

pyload rematch.py

go http://lists.sourceforge.net/lists/admindb/listname fv 1 adminpw pass submit 1

do set_match --form 1 "\\\\d+$" 3 submit 1

'twil flush out all messages in your queue...

Decorators in Quixote

Kevin Dangoor queried about decorators in Quixote, and here are implementations of his two suggested decorators. They seem to make sense syntactically.

First, one to restrict access to the decorated function to logged-in users:

from quixote.errors import AccessError

def require_login(func): """ decorator: require login to run decorated function. """ def wrapper(request): if not request.session.user: raise AccessError("you must be logged in!") return func(request)

return wrapper

Use like so:

def func(request):

A slightly more complex case follows: here are two functions to export names for publication by Quixote.

def export(func):
    decorator; export decorated function under its __name__.
    _q_exports = func.func_globals['_q_exports']

return func

def export_names(*names): """ decorator; export decorated function under all given names. """

# build a new function to return; this is what will be called on # the following function. def export_func(func, names=names): _q_exports = func.func_globals['_q_exports'] for name in names: _q_exports.append((name, func.__name__,))

return func

return export_func

Use these like so:

def func(request):

@export_names("name1", "name2") def func2(request): ...

It's a little bit irritating that you have to grab _q_exports from func_globals but *shrug* that's scoping for ya! (If you don't do this, then you can't import the decorators from another module.)


24 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!