Older blog entries for titus (starting at number 36)

23 Jan 2005 »

CleverCS posts a cute article about combatting Web spam with TrustRank. Reminds me of Advogato's trust metric... I've been thinking that something similar would work for genome annotation for some time.

Code coverage of C/C++ extension modules for Python: Defeat

I've been (temporarily ;) defeated in my attempts to use gcov to do a coverage analysis of my paircomp tests. All of my tests are written in Python and use my extension API for the C++ library to exercise the C++ code; thus the C++ code exists in shared libraries. Unfortunately, gcov does not naively support shared libraries. Bummer.

First, I tried extracting the __bb_init_func code from libgcc.a and linking that into the shared libraries explicitly; that got rid of the error but didn't seem to actually enable coverage analysis.

My second attempt was to write a short C program that embedded Python in a C++ program into which my extension modules had also been compiled. That worked up to a point -- I got everything to compile, and coverage analysis was started -- but I couldn't import any of the Python extension modules without an error.

I'll sleep on it and see what I come up with... does anyone know of any other C code coverage analysis programs?

--titus

18 Jan 2005 »

Spent most of the day doing biology things, but managed to help clean up a small CGIHTTPServer problem and also contributed a doc patch to distutils to fix my earlier complaints.

A couple of short responses to advogato users:

gnutizen asks about learning good C programming style. My suggestions are:

  • Read other people's code, a lot. Back in the early '90s when I first dove into C, I spent some time getting GNU utilities to compile on both an SGI and a weird BSD/SYSV crossover machine we had. I learned a helluva lot about C programming from that, especially with respect to portability.

  • Fix other people's code, a lot. Ditto above.

  • Work on small parts of some open source project or another. I worked on a conquer-like game called dominion, with a group of pretty good hackers. In the end I think the overall design was lacking, but the nitty gritty of each individual code file was crafted by very experienced C hackers.

  • Read a lot. For large-scale program design, Lakos's C++ book is fantastic; Stevens' book on UNIX Network Programming was a prime source of material for me before that. Books like Pragmatic Programmer and so on offer a lot of advice that seems too obvious to be useful, but is in fact quite useful.

Anyway, that's my 2 cents (FWIW, IMO ;). These days I find myself writing relatively little C++ code, and even less straight C code, but it's incredibly useful for hacking on other people's code.

etrepum says that "Platypus is not what you want for packaging Python applications". Without more of a reason, and never having used Platypus myself, I don't know why. However the page he points me towards contains not only py2app, which looks pretty cool, but also a variety of other very nifty looking Python tools for interaction with OS X.

--titus

16 Jan 2005 »

Sysadminning is annoying & time consuming

I do contract sysadminning for a small lab that only really needs someone to keep an eye on a Linux box with a Web server and an e-mail server. I charge them relatively little, and in turn can tell them that I'm too busy to fix something if necessary. A good trade for a grad student...

Since I switched the system from RH to Debian my life has been much easier, but hardware has a way of stepping in and reminding you who is boss.

Today's doings:

1. reboot to test on-boot install of new USB disk. reboot fails.

2. discover that the problem is in MBR. further discover disk MBR is unfixable, although the data is 99% entirely accessible. (weird...)

3. spend 2-3 hours doing things like wiping the MBR on *all* of the disks and then having to fix partition tables, etc.

4. finally get to the point where the MBR on a separate SCSI disk is booting the right kernel, then running init etc. off of the original disk. system finally fully functional in a rather hacked kind of way.

5. dinner.

6. returning from dinner, back up entire functioning system to two other disks, plus a remote system. (take that, hardware!)

Now I just have to figure out how to best transfer the functioning system off of the occasionally malfunctioning drive and onto a separate Debian install on another drive. I hope it will be as simple as find+diff to locate changed files; I didn't have to change *that* much to begin with...

On the (only) bright side, I get to charge for all of this.

Did anyone else notice how !#%!# cheap those really convenient LaCie USB and Firewire drives are? Wow -- $200/250 portable gb.

--titus

14 Jan 2005 »

Fun distutils factoid of the day:
python setup.py clean
doesn't remove your build/lib.* directories, so C++ extensions don't get recompiled. You have to do a
python setup.py clean --all
to force recompilation.

I think this is a documentation bug, since --help-commands says clean - clean up output of 'build' command rather than clean - clean up temp files from 'build' command.

As long as I'm complaining, why does

python setup.py --help
return so little useful information for package installers? You have to run
python setup.py --help-commands
to get a list of actual commands! This is a fine example of behavior built around programmers rather than users, I think ;).

I ran across the 'clean' issue because I have some C++ extension files that depend on a C++ library. I don't know how to make my setup.py care about the modification date of that library file, so my extension files are perenially out of date with respect to my actual library code.

The help-commands issue is something I run across every time I try to understand the distutils command line options.

...but enough whining. Here's something useful, instead ;). I ran across this cool OS X software today: my friend Nathan blogged about appscript, which together with Platypus make it easy to build & release simple Python apps for OS X. Very neat!

13 Jan 2005 »

Two gems from The New Yorker:

Regarding Crichton's new book on the climate change "conspiracy":

What "State of Fear" demonstrates is how hard it is to construct a narrative that would actually justify current American policy. In this way, albeit unintentionally, Crichton has written a book that deserves to be taken seriously.

Regarding Bernie Kerik (Homeland Security ex-nominee):

"Officials have gotten into trouble for sexual misconduct, abusing their authority, personal bankruptcy, failure to file documents, waste of public funds, receiving substantial unrecorded gifts, and association with organized crime figures. It is rare for anyone to be under fire on all seven of the above issues." (Henry Stern)

In other news, haruspex (accurately) characterizes me as "someone-who-doesn't-get-Perl-and-probably-never-will". To be fair, I *did* "get it" back in the mid-'90s... Musta been all those drugs I took in '99 that turned me off of it. I do like this quote from Larry Wall:

Perl isn't really about safety. It's about getting where you're going, and enjoying the trip. It's more important to be a good driver than to have seven feet of sponge rubber all around your car.

I do need to do some Perl work here and there, and the question I have for someone knowledgeable (haruspex?) is this: are there any good guidelines for designing an OO interface in Perl? I've browsed around on the 'net and while it seems possible to do pretty much anything, I don't use Perl enough to know which package(s) are help up as good examples of OO Perl. Any pointers would be much appreciated (& acknowledged)...

--titus

12 Jan 2005 »

On Web testing

Grinder isn't on many of the lists of Web testing tools I've seen, but it seems to be quite mature & gets some good press. Let me know if you try it and like it.

Charlie Stross & Perl

One of my favorite new sci-fi authors is a guy named Charles Stross. He rivals Iain Banks for plots that are turned 2 degrees to normality, and is a hacker/sysadmin by trade. It was therefore distressing to read his take on Perl:

"""
... then along comes Randal or Tom or one of the other Perl Gods, and they deliver a half-line-long command that resembles line noise, is three times as efficient as the other solutions, and leaves you scratching your head.
"""

Apparently this is a desirable feature of the language!?

Anyway, I have to admit he wrote the best description of Zope I've ever seen...

--titus

p.s. The Atrocity Archives is an amusing blend of a Cthulhu-like mythos and UNIX sysadminning. Let's just say that LARTing takes on a whole new meaning...

11 Jan 2005 (updated 11 Jan 2005 at 08:49 UTC) »

paircomp 0.9 (rc)

hooray. docs, tar.gz. It's only a small & simple comparative sequence analysis library for DNA, but it's been broken for about a month. (Asinine data structure, refactored yo' ass.)

Summary: complete reimplementation, now with regression testing. C++ library completely rewritten using the STL. 95% of the code is now tested via the Python API in one big mongo test script.

One more brick in the wall...

Ryan Tomayko comments on GPL vs the Python community. My work-related libraries are LGPLed, and my GUIs and Web interfaces are GPLed. Why? I'm an academic programmer, and my code is owned by Caltech. Neither I nor Caltech depend on income from these programs. However, I do intend to take them from job to job, and the GPL protects that. The L/GPL also protects Caltech. win/win.

--titus

9 Jan 2005 (updated 9 Jan 2005 at 18:38 UTC) »

Testing is addictive

After my many travails with PBP/maxq/sgmllib/HTMLParser/htmllib I finally sat down to work on my actual Web application, Cartwheel.

Cartwheel is a bioinformatics system that lets biologists upload sequences, analyze them, and export their analyses to a GUI, FamilyRelations II. Cartwheel itself is entirely written in Python, and FRII is written in C++ using FLTK -- a fine combination so far. I use a simple XML-RPC API to export the data & most of the internal communication between Cartwheel components is done in PostgreSQL. The system as a whole has been used by a few hundred people to do bioinformatics work and in general it's fairly robust. It's been around for several years, and I'm pretty much the only steady developer.

Normally I test Cartwheel's Web interface by roaming around in it with a Web browser and paying special attention to things I've changed. Until recently, I had no automated way to test it, and so in general I've been assuming it's mostly ok if no users yell at me after I post an update. (Known as the "Microsoft test method"... ;)

The mass of little bug reports recently reached a critical point, and so I started to fix them today. One of the problems had to do with some naively implemented search code in the Web interface, and so I set out to define the problem by changing the names of a bunch of the form variables. (This also fixed the bug, which tells you something about the code...) I had to edit files all over the place & quickly lost track of what code still needed to be patched.

So, I backed out all of my changes & used maxq to record a Web session that ran through all of the places where the search functionality was used. I saved the resulting PBP scripts and broke them into setup, test, and teardown scripts. I then went through the code base, made my changes, and re-ran the tests and fixed bugs caused by oversight until the tests all succeeded.

Another cool thing is that with the scripts separated into setup, tests, and teardown categories, I can also test my database export and import code quite easily:

setup-test-db
run-all-tests

export-db clear-db import-db

export-db clear-db import-db

run-all-tests

With only a few assumptions about what the setup script does to the DB (basically, that it's complete), this will tell me if my import/export scripts are catching everything.

Overall I probably spent about 3x the amount of time necessary to fix the bug on generating the tests in this manner, but now that I have a (simple) framework set up to do it, it should go faster... One thing is for sure: writing PBP tests without maxq would be painful!

A while back I asked about other Web testing tools, and John J. Lee recently responded with this link: http://wwwsearch.sourceforge.net/bits/GeneralFAQ.html. The Zope link is buggered, but overall I get the impression that there simply aren't many general Web testing tools for Python.

A few other Web testing link collections are on java-source.net and c2.com. JJL also pointed me towards opensourcetesting.org.

I'm interested in finding out about others, please let me know if you find any.

--titus

8 Jan 2005 (updated 9 Jan 2005 at 02:19 UTC) »

"Batteries included" may be a bad term

After some of the recent types kerfuffle, I decided I didn't like the "batteries included" description of Python. Let me explain.

Python-the-language is great, and fits most of my needs. Heck, it's done that since before 2.0 came out. It's always nice to discover that some nifty new feature (like list comprehensions) has been added, and often I can take advantage of them to write tighter/neater/more understandable code. Nonetheless, I'm fundamentally a systems & application programmer, and I don't generally write language frameworks, toolkits, etc., so these new features aren't usually all that critical to my work.

What is critical is the fantastic standard library that comes with Python. It has been steadily expanded in the years that I've been a Python programmer, to the point that when I'm looking around on the 'net for some functionality it's already in the Python lib more often than not. This saves me a lot of time & trouble installing new products. (I've also said that I think that the state of the included Python code, as well as the quality of the documentation & example code, plays a big part in educating new Pythonistas about acceptable coding practices with Python. The Python cookbook is a particularly nice collection of code, too. These both play a big role in the success of Python, IMO.)

So, batteries are included, if you think of the library as the "batteries" -- something essential to function, yet not terribly interesting in its own right. Looking at that description of batteries, though, it sounds more like it fits my conception of the Python language, though. After all, the language that things are written in is essential, yet not always all that interesting; doing things in the real world is more interesting. For that you usually need to interact with users or protocols, and that means some sort of interface, and ... you end up with a library of code that enables a lot of standard interactions. In Python, that code is distributed with the interpreter, which enables a stunningly wide range of functionality in your basic Python installation.

In other words, I think the included library & modules are more interesting than the word "batteries" might imply ;).

There may be a deeper schism lurking behind the recent decorator & types controversy: language vs tools. It's very sexy to work on extending the language: there are plenty of deep problems to be tackled, much thought must be expended, and only really really smart thoughtful people can do it well. As Iwan van der Kleyn pointed out, though, there are other things to do that will extend Python's reach, power, and utility. It might be worth the core team having a look at those, if their goal is to advance Python as a whole. If, instead, the goal is to advance Python-the-language, then I think they're doing a fantastic job of it & should keep it up. Personally I'm more worried about the future of the library.

UPDATE: AMK posted about this months ago, and got a lot of responses. I think that's what got me started...

And, before the naysayers get up & yell at me for my multitude of sins, I do have some specific work and proposals in mind. It's hard to grasp the library as a whole, though. (It's easier to write a criticism than it is to do the work, too, that's for sure.) Watch This Space.

As for the long-term future of Python, Guido's first priority clearly needs to be to grow a beard.

--titus

5 Jan 2005 »

Python's future -- language, library, or tools?

There's been quite some kerfuffle about Guido van Rossum's proposal to add optional static typing to Python. In particular, Iwan van der Kleyn pointed out that Python seems to be caught between language extenders and application programmers, and that at the moment the language extenders are winning. Iwan points out that for application programmers, there are several library or tool improvements that are probably as, or more, important than any language addition. While I don't agree with all of his feature/app requests, I do agree with the general idea that Python would benefit hugely from investment in something other than optional static typing. Or decorators. Or (insert feature of the week).

There was an ACM Queue article a while back (if someone has the link, I'd appreciate it!) that discussed the divide between language power users & tool power users. As I recall, the article described the split between developers who invest time in learning new and more powerful languages, and developers who invest time in learning IDEs and language-specific tools. What seems to be happening in Python is that GvR & the core team consist of very smart people who believe the language should be extended -- and are busy doing that. IDEs and tool extension is left to others & their efforts are patched into the library system where appropriate.

The real question about language extensions like optional type checking and decorators is how contaminatory they are. Istvan Albert believes that "optional" type checking will soon become effectively non-optional; while he mentions only the social (workplace) pressures, I'm more worried about whether or not the language will require it in places where I don't care to use it. If a library module uses type checking, will any code that uses that library module be required to adhere to the type behavior specified?

I can see three scenarios.

The first scenario is that any code that uses type-checked code will be required to adhere to the type spec. In the medium term, type checking will become mandatory throughout the codebase because people will start using it in the standard library.

The second scenario is that "duck typing" will magically solve the problem & allow untyped code to use typed code. This seems to me to be the best option, and I bet it's what GvR has in mind.

The third scenario is that there will simply be command-line arguments to Python (or internal configuration directives) that disable type checking completely, allowing schlubs like me to go about our day and ignore large portions of the language.

I'd be ok with either #2 or #3. #1 sounds like a disaster to me. Either way, I expect a large amount of energy to be expended on this set of features that, frankly, is not so important to me. (Maybe it should be, but I'm not betting on it... I like Bruce Eckel's take on static typing.)

Regardless, there's the question of whether the core team is focused too much on language development. Is the relentless drive for new, immensely powerful language features detracting from the stability and usability of the language?

It may be time to consider how to "fork" Python development so that it works more like Linux. Why not keep a 2.x branch going that focuses on long-term stability & library expansion, while building a 3.x branch with new language features but fairly complete backwards compatibility? In my traipses through the library it doesn't seem like the library really takes advantage of the new language features in 2.4, which (if true) in itself is a telling trend that suggests that language development and library development are orthogonal. Even if they aren't orthogonal, maybe they could be made more so, and this could be a worthwhile endeavor.

The balance between language stability and new features seems like it's tough to maintain. GvR et al. have done a fantastic job so far. Here's hoping they can keep doing a fantastic job!

--titus

27 older entries...

New Advogato Features

FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!