Older blog entries for titus (starting at number 38)

26 Jan 2005 (updated 26 Jan 2005 at 18:11 UTC) »
Poor man's advogato pull script

People seem to be losing diary entries occasionally, so I instituted a backup policy for my advogato diary. Here's the script.

(Kudos to advogato for having an XMLRPC API...)

#! /usr/bin/env python
import xmlrpclib, os, time, stat

server = xmlrpclib.Server("http://www.advogato.org/XMLRPC") n = server.diary.len('titus')

entries = [] for i in range(n): print '... entry', i, created, modified = server.diary.getDates('titus', i) created = time.mktime(time.strptime(created.value, "%Y%m%dT%H:%M:%S")) modified = time.mktime(time.strptime(modified.value, "%Y%m%dT%H:%M:%S"))

filename = '%d.txt' % (i,)

found = 0 try: info = os.stat(filename) found = 1 except OSError: pass if not found: print '--> changed/DNE'

entry = server.diary.get('titus', i) open(filename, 'w').write(entry) os.utime(filename, (created, modified,)) else: print 'unchanged.'

The only tricky part was figuring out how to extracted the created/modified value from the xmlrpc module, which required some digging...

--titus

p.s. apologies for the indentation; the actual HTML is properly indented, but it looks like advogato does funny things to it.

25 Jan 2005 (updated 25 Jan 2005 at 19:25 UTC) »

Hey, salmoni, major congrats! I hope to defend RSN, myself...

Rails:Ruby :: Zope:Python

(just five years later ;)

I love hype; back when, I remember seeing people post "I want to write a Web page in Python, what do I use?" and getting "Zope is the answer!" as the response. Given Zope's size, complexity, and overwhelming amount of functionality, I didn't think this was the right direction to point Python newbies. It all seems to have worked out in the end, though: there's now a diverse and healthy population of Web frameworks available for Python, and you can choose what you want. It may be more confusing for newbies than the old "use Zope" advice, but the flip side is that there are now several good options and some healthy competition in the Python Web space, you can have it your way.

Now Ruby on Rails shows up & people freak out, with the Ruby people claiming that "this can't be done in Python" and the Python people saying a variety of polite and impolite things in response. ...and then there are the users complaining about lack of documentation, which was my beef with Zope back in the Dark Ages.

I have yet to see any evidence that Python and Ruby are substantively different, but I haven't looked much beyond c2.com. I tend to like Ian's take on things, and he thinks it's less a language thing than an app/integration thing. It sounds like rewinding the clock by 5 or 10 years and building a leaner, meaner Zope with Python 2.2 would result in the similar advantages for Python: One True App Framework. (It is an interesting (if odd) take on things to say that we would be better off with less choice! Smacks of B&D to me.)

Instead, we're stuck with a multitude of Web/app frameworks in Python, which people seem to think is a problem. Hey, folks, maybe (as with GUIs) there's more than One Right Way to do this? Just a thought...

It is clear that some people just can't take a joke ;).

PostgreSQL 8.0 & ORMs

PostgreSQL 8.0 now supports savepoints. Back when, I argued that any standard persistence framework for Python shouldn't require functionality not available in at least open good open-source SQL database; my main complaint here was that savepoints somehow became an issue in the Persistence-SIG discussions.

It may be time to dust off cucumber, which was developed before Python 2.2, and update it. cucumber is my ORM linking Python to PostgreSQL with class inheritance based on PostgreSQL's table inheritance. I've been using it for 3 years, and one or two other people discovered it & used it for a bit. The only two complaints I've received are that it's slow (well, OK, yeah...) and that it defeats introspection. I think the former is insoluble and the latter could probably be remedied with metaclasses, although I'm not sure.

I'm still quite happy with cucumber, and it's saved me hundreds of hours of programming. I'm still wedded to the idea of having an SQL interface for my data, and when you add in the great feature of table inheritance (which seriously reduces ORM impedance mismatch at the expense of tying you to PostgreSQL) I don't think the idea can be beat. (My code is another matter...)

...or I could just switch to using SQLObject, which is becoming a frequently observed keyword in Python fora.

--titus

CleverCS posts a cute article about combatting Web spam with TrustRank. Reminds me of Advogato's trust metric... I've been thinking that something similar would work for genome annotation for some time.

Code coverage of C/C++ extension modules for Python: Defeat

I've been (temporarily ;) defeated in my attempts to use gcov to do a coverage analysis of my paircomp tests. All of my tests are written in Python and use my extension API for the C++ library to exercise the C++ code; thus the C++ code exists in shared libraries. Unfortunately, gcov does not naively support shared libraries. Bummer.

First, I tried extracting the __bb_init_func code from libgcc.a and linking that into the shared libraries explicitly; that got rid of the error but didn't seem to actually enable coverage analysis.

My second attempt was to write a short C program that embedded Python in a C++ program into which my extension modules had also been compiled. That worked up to a point -- I got everything to compile, and coverage analysis was started -- but I couldn't import any of the Python extension modules without an error.

I'll sleep on it and see what I come up with... does anyone know of any other C code coverage analysis programs?

--titus

Spent most of the day doing biology things, but managed to help clean up a small CGIHTTPServer problem and also contributed a doc patch to distutils to fix my earlier complaints.

A couple of short responses to advogato users:

gnutizen asks about learning good C programming style. My suggestions are:

  • Read other people's code, a lot. Back in the early '90s when I first dove into C, I spent some time getting GNU utilities to compile on both an SGI and a weird BSD/SYSV crossover machine we had. I learned a helluva lot about C programming from that, especially with respect to portability.

  • Fix other people's code, a lot. Ditto above.

  • Work on small parts of some open source project or another. I worked on a conquer-like game called dominion, with a group of pretty good hackers. In the end I think the overall design was lacking, but the nitty gritty of each individual code file was crafted by very experienced C hackers.

  • Read a lot. For large-scale program design, Lakos's C++ book is fantastic; Stevens' book on UNIX Network Programming was a prime source of material for me before that. Books like Pragmatic Programmer and so on offer a lot of advice that seems too obvious to be useful, but is in fact quite useful.

Anyway, that's my 2 cents (FWIW, IMO ;). These days I find myself writing relatively little C++ code, and even less straight C code, but it's incredibly useful for hacking on other people's code.

etrepum says that "Platypus is not what you want for packaging Python applications". Without more of a reason, and never having used Platypus myself, I don't know why. However the page he points me towards contains not only py2app, which looks pretty cool, but also a variety of other very nifty looking Python tools for interaction with OS X.

--titus

Sysadminning is annoying & time consuming

I do contract sysadminning for a small lab that only really needs someone to keep an eye on a Linux box with a Web server and an e-mail server. I charge them relatively little, and in turn can tell them that I'm too busy to fix something if necessary. A good trade for a grad student...

Since I switched the system from RH to Debian my life has been much easier, but hardware has a way of stepping in and reminding you who is boss.

Today's doings:

1. reboot to test on-boot install of new USB disk. reboot fails.

2. discover that the problem is in MBR. further discover disk MBR is unfixable, although the data is 99% entirely accessible. (weird...)

3. spend 2-3 hours doing things like wiping the MBR on *all* of the disks and then having to fix partition tables, etc.

4. finally get to the point where the MBR on a separate SCSI disk is booting the right kernel, then running init etc. off of the original disk. system finally fully functional in a rather hacked kind of way.

5. dinner.

6. returning from dinner, back up entire functioning system to two other disks, plus a remote system. (take that, hardware!)

Now I just have to figure out how to best transfer the functioning system off of the occasionally malfunctioning drive and onto a separate Debian install on another drive. I hope it will be as simple as find+diff to locate changed files; I didn't have to change *that* much to begin with...

On the (only) bright side, I get to charge for all of this.

Did anyone else notice how !#%!# cheap those really convenient LaCie USB and Firewire drives are? Wow -- $200/250 portable gb.

--titus

Fun distutils factoid of the day:

python setup.py clean
doesn't remove your build/lib.* directories, so C++ extensions don't get recompiled. You have to do a
python setup.py clean --all
to force recompilation.

I think this is a documentation bug, since --help-commands says clean - clean up output of 'build' command rather than clean - clean up temp files from 'build' command.

As long as I'm complaining, why does

python setup.py --help
return so little useful information for package installers? You have to run
python setup.py --help-commands
to get a list of actual commands! This is a fine example of behavior built around programmers rather than users, I think ;).

I ran across the 'clean' issue because I have some C++ extension files that depend on a C++ library. I don't know how to make my setup.py care about the modification date of that library file, so my extension files are perenially out of date with respect to my actual library code.

The help-commands issue is something I run across every time I try to understand the distutils command line options.

...but enough whining. Here's something useful, instead ;). I ran across this cool OS X software today: my friend Nathan blogged about appscript, which together with Platypus make it easy to build & release simple Python apps for OS X. Very neat!

Two gems from The New Yorker:

Regarding Crichton's new book on the climate change "conspiracy":

What "State of Fear" demonstrates is how hard it is to construct a narrative that would actually justify current American policy. In this way, albeit unintentionally, Crichton has written a book that deserves to be taken seriously.

Regarding Bernie Kerik (Homeland Security ex-nominee):

"Officials have gotten into trouble for sexual misconduct, abusing their authority, personal bankruptcy, failure to file documents, waste of public funds, receiving substantial unrecorded gifts, and association with organized crime figures. It is rare for anyone to be under fire on all seven of the above issues." (Henry Stern)

In other news, haruspex (accurately) characterizes me as "someone-who-doesn't-get-Perl-and-probably-never-will". To be fair, I *did* "get it" back in the mid-'90s... Musta been all those drugs I took in '99 that turned me off of it. I do like this quote from Larry Wall:

Perl isn't really about safety. It's about getting where you're going, and enjoying the trip. It's more important to be a good driver than to have seven feet of sponge rubber all around your car.

I do need to do some Perl work here and there, and the question I have for someone knowledgeable (haruspex?) is this: are there any good guidelines for designing an OO interface in Perl? I've browsed around on the 'net and while it seems possible to do pretty much anything, I don't use Perl enough to know which package(s) are help up as good examples of OO Perl. Any pointers would be much appreciated (& acknowledged)...

--titus

On Web testing

Grinder isn't on many of the lists of Web testing tools I've seen, but it seems to be quite mature & gets some good press. Let me know if you try it and like it.

Charlie Stross & Perl

One of my favorite new sci-fi authors is a guy named Charles Stross. He rivals Iain Banks for plots that are turned 2 degrees to normality, and is a hacker/sysadmin by trade. It was therefore distressing to read his take on Perl:

"""
... then along comes Randal or Tom or one of the other Perl Gods, and they deliver a half-line-long command that resembles line noise, is three times as efficient as the other solutions, and leaves you scratching your head.
"""

Apparently this is a desirable feature of the language!?

Anyway, I have to admit he wrote the best description of Zope I've ever seen...

--titus

p.s. The Atrocity Archives is an amusing blend of a Cthulhu-like mythos and UNIX sysadminning. Let's just say that LARTing takes on a whole new meaning...

11 Jan 2005 (updated 11 Jan 2005 at 08:49 UTC) »
paircomp 0.9 (rc)

hooray. docs, tar.gz. It's only a small & simple comparative sequence analysis library for DNA, but it's been broken for about a month. (Asinine data structure, refactored yo' ass.)

Summary: complete reimplementation, now with regression testing. C++ library completely rewritten using the STL. 95% of the code is now tested via the Python API in one big mongo test script.

One more brick in the wall...

Ryan Tomayko comments on GPL vs the Python community. My work-related libraries are LGPLed, and my GUIs and Web interfaces are GPLed. Why? I'm an academic programmer, and my code is owned by Caltech. Neither I nor Caltech depend on income from these programs. However, I do intend to take them from job to job, and the GPL protects that. The L/GPL also protects Caltech. win/win.

--titus

9 Jan 2005 (updated 9 Jan 2005 at 18:38 UTC) »
Testing is addictive

After my many travails with PBP/maxq/sgmllib/HTMLParser/htmllib I finally sat down to work on my actual Web application, Cartwheel.

Cartwheel is a bioinformatics system that lets biologists upload sequences, analyze them, and export their analyses to a GUI, FamilyRelations II. Cartwheel itself is entirely written in Python, and FRII is written in C++ using FLTK -- a fine combination so far. I use a simple XML-RPC API to export the data & most of the internal communication between Cartwheel components is done in PostgreSQL. The system as a whole has been used by a few hundred people to do bioinformatics work and in general it's fairly robust. It's been around for several years, and I'm pretty much the only steady developer.

Normally I test Cartwheel's Web interface by roaming around in it with a Web browser and paying special attention to things I've changed. Until recently, I had no automated way to test it, and so in general I've been assuming it's mostly ok if no users yell at me after I post an update. (Known as the "Microsoft test method"... ;)

The mass of little bug reports recently reached a critical point, and so I started to fix them today. One of the problems had to do with some naively implemented search code in the Web interface, and so I set out to define the problem by changing the names of a bunch of the form variables. (This also fixed the bug, which tells you something about the code...) I had to edit files all over the place & quickly lost track of what code still needed to be patched.

So, I backed out all of my changes & used maxq to record a Web session that ran through all of the places where the search functionality was used. I saved the resulting PBP scripts and broke them into setup, test, and teardown scripts. I then went through the code base, made my changes, and re-ran the tests and fixed bugs caused by oversight until the tests all succeeded.

Another cool thing is that with the scripts separated into setup, tests, and teardown categories, I can also test my database export and import code quite easily:

setup-test-db
run-all-tests

export-db clear-db import-db

export-db clear-db import-db

run-all-tests

With only a few assumptions about what the setup script does to the DB (basically, that it's complete), this will tell me if my import/export scripts are catching everything.

Overall I probably spent about 3x the amount of time necessary to fix the bug on generating the tests in this manner, but now that I have a (simple) framework set up to do it, it should go faster... One thing is for sure: writing PBP tests without maxq would be painful!

A while back I asked about other Web testing tools, and John J. Lee recently responded with this link: http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html. The Zope link is buggered, but overall I get the impression that there simply aren't many general Web testing tools for Python.

A few other Web testing link collections are on java-source.net and c2.com. JJL also pointed me towards opensourcetesting.org.

I'm interested in finding out about others, please let me know if you find any.

--titus

29 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!