Older blog entries for titus (starting at number 18)

16 Dec 2004 (updated 16 Dec 2004 at 09:20 UTC) »
dcoombs -- have you tried NJAMD? I've had moderately good luck with it...

Testing Web sites

Revisited Cory Dodt's Python Browser Poseur (PBP) today. This is one of those projects that frequently pops into my head as something worth investigating, but I've never actually looked at it seriously. (And the last time I looked, there were still some fairly obvious broken bits that prevented me from making use of it -- but there's a new version...)

PBP is the best (simplest + easiest) way I've seen to test dynamic Web sites. It's based on mechanize, a Python version of WWW::Mechanize, and it provides a simple scripting ability to automate Web site browsing. Even someone without extensive Python experience can write scripts for it, which is an advantage for groups that aren't all programmers. I haven't tried extending it but I doubt it's that difficult; the package code looks clean & is relatively short.

PBP is relatively simple, at least on the surface: here's the example script from their site.

go http://mailinator.com
code 200
find "property of Outsc.*me"
showform
formvalue search email pbp.berlios.be
submit search
code 200
find "NO MESSAGES"

When executed by pbpscript, this script goes to mailinator.com, searches for the regexp "Outsc.*me" (which matches "...is property of Outscheme, Inc"), and then checks e-mail for the pbp.berlios.be@mailinator.com e-mail address. If there are any messages, the script fails. (Try changing '.be' to '.de' if you want to see -- sorry, I screwed up the example on the Web page by stupidly sending e-mail to pbp.berlios.de@mailinator.com.)

This is cool.

I'm puzzled that neither mechanize nor PBP are better known (as in, I haven't seen it mentioned anywhere but in c.l.p.announce on the occasion of a new release). I don't monitor freshmeat, which is another place it's been posted. Apart from that a google search doesn't turn up much mention. Am I missing a wealth of similar software that is better? What do people use to test Web sites, anyway? A currently-defunct list archive has a reference to HttpUnit, which is a nice-looking Java framework. Unfortunately I doubt it's as Python-extensible as PBP ;). John Lee of mechanize also points out webunit, by Richard Jones (also author of Roundup). I may have to take a look at that. Anything else?

In the interests of exercising PBP a bit, I wrote a simple PBP script (note: transient link) to run through my WSGI adapter interface for the Quixote demo. You can try it out if you want; the site just runs quixote.demo through a CGI-->WSGI -->QWIP bridge. And yes, it's veeeeeeery slow.

I ran into only one real problem with PBP: HTML encoded form values. In the Quixote widget demo, there's a select widget that takes pizza sizes with inch units, e.g. 'Medium (10")'. The mechanize ClientForm is returning this in HTML-encoded form, 'Medium (10")', and PBP demands that it be set to this value. However, Quixote barfs on this because it is expecting 'Medium (10")' -- which is in fact what Quixote sees from browsers. There may be some invisible layers of encoding/decoding going on; Quixote uses cgi.FieldStorage which presumably decodes a properly-encoded string from the browser. I think the appropriate thing to do here is to change mechanize's behavior, but I will ask Cory what he thinks first; I haven't dealt with this aspect of HTML forms before, having been spoiled by nice libraries ;).

Next I'll have to try extending PBP from Python & vice versa. Anon...

--titus

"Consider the situation of two trauma surgeons arriving at an accident scene. The patient is bleeding profusely. If surgeons were like programmers, they'd leave the patient to bleed out in order to have a really satisfying argument over the merits of two different kinds of tourniquet." -- Philip Greenspun.

16 Dec 2004 (updated 5 Jan 2005 at 00:43 UTC) »

a whole entry vanished!

14 Dec 2004 (updated 14 Dec 2004 at 18:49 UTC) »
haruspex... the problem is that the right-wing nutsos are saying "let's burn all the oil before we do anything else" and the left-wing nutsos are saying "nothing but non-nuclear renewable energy will do", leaving me with the currently unimplementable centrist view of "let's switch over to nukes, while exploring alternative energy options and weaning ourselves from fossil fuels".

Unfortunately for Americans (I'm in the US) where we have these things called "elections", we do often have to choose between only two real options. In this case I'm not even sure what Kerry's standpoint was on the environment, but I didn't like his views on the Patriot Act (he voted for it) or his views on the Iraq invasion (he voted for giving Bush the power to do it & then made a U-turn for political reasons -- which inconsistency I despise). I still voted for him because I despised Bush more ;).

So while I agree with you in theory, in the real world it's different. I can't stand Bush, but I also find most of his real political opposition to be anti-reason. Who do I support? Kucinich? Or Dean? Or Al Sharpton, who makes an awful lot of sense? I would have voted for McCain just based on consistency, but Bush managed to torpedo him in 2000... so I'm left with whomever the Democrats support. Which unfortunately was Kerry.

All of which is besides the point, eh? I think Crichton is dangerous but not necessarily wrong. And I really like your face-hugger image!

tk, I'd go even further and say only people following the scientific methodology are scientists, whatever the others may call themselves.

On a side note, I wish I didn't have to post a full diary entry to respond personally to you folks. Is Advogato undergoing much development these days? It would be nice to add a comment ability.

--titus

14 Dec 2004 (updated 14 Dec 2004 at 07:23 UTC) »
haruspex and tk are getting personal, o my!

I hadn't seen Crichton's latest before today, but the Caltech talk I mentioned in my own little screed is available online.

Honestly, I'm not sure what to make of it all. I stand by what I said before: science -- not religion, nor "public policy debates", nor extreme left-wing environmentalism -- is going to give us facts. Crichton's attitude is that scientists are polarized towards the left and biasing their results & discussion in that direction, and this needs to be corrected. Unfortunately he fails to note that, historically, only scientists actually correct science; it may take a while, but the truth will out. This omission, combined with the prevailing political climate in our federal gov't, means that he is simply playing into the hands of people that are at least as illogical as (and much less interested in objective truth than) any scientist.

There's an interesting article in Science Magazine (the top journal in scientific research) that might be useful reading. E-mail me if you can't get access.

I have to agree with tk that the Greenpeace style of environmentalists are somewhat idiotic. There's a reason why China may establish the first really large-scale use of nuclear reactors -- which is probably the only medium-term hope for decreasing fossil-fuel use. It's sad that we have to choose between right-wing nutsos and left-wing nutsos on issues like this. (Side anecdote: a few years back, my advisor was in Germany. He saw a a political protest against GM foods where the German Green Party was chanting "Food without genes!". Hmmm...)

You guys should both read Neal Stephenson's Zodiac. Fantastic book that spares no one -- and a crackin' good read, much like Snow Crash but without the long-winded ending ;).

Peace out,

--titus

Hey, berend, your "Mars global warming" reference was pulled by the paper that published it -- and o look, it's not been published anywhere else! Looks like the Denver Post got taken... but that's besides the point: it's not exactly hard to measure the Sun's energy output, and I'm pretty sure we'd have noticed if it was going up substantially. It <ahem> hasn't.

Gravity's just a theory, too.

Creationists and those who firmly believe climate change isn't driven by humans miss the point: science isn't about providing certainty. It's about providing uncertainty.

Take gravity. Gravity is something that we can observe pretty easily just by dropping an apple. We can note correlations (massier planets seem to have larger gravitational fields, for example). We can guess that, since the flux per unit area through the surface of a sphere decreases as the inverse square of the sphere's radius, gravity is subject to the inverse square law. We can even posit underlying mechanisms linking gravity to a specific particle, like the Higgs boson. What we can't do is prove that we understand how gravity works, except in terms of other theories (like particle theory and general relativity). We also can't guarantee that gravity functions the same way (or at all!) in places out of our direct experimental reach -- we can just show that the cosmological motions we see match our expectations were gravity to work the same.

These are the same objections that people bring to evolution and climatology: we don't understand much about the underlying mechanisms in either area. We can't show that the same rules that we see operating today are the rules that operated 2,000 or 4,000 or 500,000 years ago. We can say that what we see in the fossil record and among living organisms today strongly suggests a single common ancestor for all life on earth; but we can't rule out the theory that God created the earth 6,000 years ago, because we don't have any objective observers from that time. We certainly can't demonstrate that human activity has caused climate warming, although there do seem to be significant correlations between human activity and climate change. (Note that correlation does not imply causation, though.)

So, why is gravity undisputed (except by Flat Earth people)? And why are climate change and evolution such hot topics? I'm not sure, but I can suggest a few reasons.

Gravity is undisputed today partly because no religion has made the precise mechanism a point of recent dispute. It used to be in dispute, though; remember Galileo? That, ultimately, was a dispute about gravity on the scale of our solar system. Yet no edicts about the Higgs boson, or general relativity, have emanated from the Catholic Church, and Bush doesn't seem to care about gravity.

Another reason that people don't argue much about gravity is that the theory of gravity is predictive. Given a comet's position and momentum, we can tell you pretty much where it's going to go. It's a little harder in atmosphere, but we do it very well -- think ballistic missiles, for example. This predictive power goes a long way towards quieting dissent with the theory, because if you can predict something people will generally believe you understand it pretty well. (We'll come back to this.)

Evolution, for better or for worse, is not in the same position. It's a major point of dispute in at least a few places, and it's not predictive in the least. Even worse, it can't be very specific in predictions, because it's a stochastic theory that is subject to historical contingency. We will never be able to predict what mutations will arise randomly, and we will probably never be able to predict what effect those mutations will have on ecosystems. We might be able to predict general trends, but that is still far away from being an exact science.

Climatology is a much younger science than either the physics of gravity or the study of evolution. Like evolution, and unlike gravity, it seems to be very sensitive to certain kinds of perturbations -- that is, it's "chaotic". Very small changes may have large effects elsewhere. Moreover we don't understand many of the basic processes very well, and we don't have good ways to measure even relatively simple things like energy input from the sun, much less complicated things like CO2 consumption. Climatology is certainly not a predictive science in general, although some things can be predicted, just like in evolution: if you know where a hurricane is today, you can guess pretty well where it's going to be tomorrow.

Climatology is also a big point of contention for economic reasons: global warming, in particular. Corporations don't want to reduce the emissions of greenhouse gasses because they believe that it will have a negative economic impact on them. Therefore they (or their proxies) attack global warming as an unproven theory, in order to undermine its impact on public policy. As with the religiously motivated attacks on evolution, this is definitely bad for science.

If we could predict climate, or predict the effects of evolution, presumably people would regard these theories as being more credible than they are now. Unfortunately it's impossible to turn evolution into a predictive theory, and it's going to be a while before we get a predictive handle on climatology. So both theories are amenable to attack on the charges of being "unproven".

And here we come to the nut: the scientific method can't prove anything, in general. It is is much, much better at disproving theories than it is at confirming them; any working scientist will agree with that! All that an honest scientist can say about gravity, or evolution, or global warming, is that they haven't been disproven yet. There are reasons to believe that gravity and evolution are pretty good theories, scientifically speaking, because they've withstood the test of time. I'm not very knowledgeable about climatology but I do know it's quite a bit shakier in its underpinnings. But attacking any of these theories for not having provided proof is missing the whole point of science, which is to disprove as much as possible.

People -- even many intelligent people who should know better -- frequently get this wrong. Michael Crichton, the prolific author of (among other books) Jurassic Park, gave an interesting lecture at Caltech where he talked about scientist's involvement in political debates on public policy. Nuclear winter and global warming were two examples where a strongly biased view has been pushed strongly and publicly by a relatively small cadre of scientists. Crichton's view seemed to be that scientists were no less fallible than anyone else, which is undeniable (though unpopular among scientists ;). What he missed, and what I think many scientists fail to emphasize, is that thus far the scientific method -- with objective measurements and peer review, in particular -- is the only proven method of discovery known to mankind. We ignore it at our peril.

Scientists can do their part by proudly admitting ignorance. It's not pleasant, but it's undeniable: did you know, for example, that the underlying mechanism by which evolutionary novelty arises is still in dispute? Yep! We still don't really understand how new traits arise! And did you know that the precise reflectivity of the earth -- which is a major determinant of energy input into our climate, and is directly linked to the "greenhouse effect" -- is still not easily measurable? Yep! No long-term trends available! And these are just two things I've worked on -- I'm sure there's an ocean of ignorance out there, just waiting to be publicized. That's science!

The flip side of the coin is that those who critically examine scientific theories should apply the same level of critical analysis to their own beliefs. This applies to postmodern lit-crit as much as it applies to religious believers -- and I think it's as important as science is, as a method for making public policy.

Note to readers: I've been thinking about writing something like this for a while. It's an ongoing project, so please e-mail me at titus@caltech.edu if you have thoughts, criticisms, or suggestions.

The only problem with troubleshooting is that trouble sometimes shoots back. -- Joe Zeff.

I've been noticing a fair amount of commentary on Python and Java lately: I particularly enjoyed Bruce Eckel's take on Static vs Dynamic typing, and Phillip Eby's Python Is Not Java (and Java Is Not Python, either). Phillip Eby makes the point that the Python and Java mindsets are quite different when it comes to frameworks: Python programmers tend to develop the structure out as they need it, while Java designers try to specify the frameworks' structure first & then fill in with specific implementations. Isn't this antithetical to the agile programming paradigm that's been gaining popularity lately?

Jython does a nice job of mingling Java libraries with Python coding; I think many of the Python-native extension modules can be loaded directly by Jython, too. Is this a possible solution to the question of static vs dynamic typing -- build your software in a language like Jython, and then slowly solidify it into Java?

I primarily do research programming, in which the specific goals of the software are largely undefined & the flexibility of the code should be one of the proximal design considerations, so I definitely prefer the Python(/Perl/Ruby) mindset in day-to-day work. There is a question in my mind, though, about where future bioinformatics software efforts will aim: I doubt that the current loosely-coupled/badly-specified project-specific protocols for genome databases and service frameworks will last, so where next? We could either start developing specifications (e.g. the distributed annotation system (DAS) or MAGE) or implementations (e.g. GMOD). If the former, there will be a significant barrier to entry for new projects, as they will need to spend time developing to the standard and confirming adherence. (This is the primary reason why DAS is a failure, I think.) If the latter, I predict a general tendency towards complexity of internal design as different projects try to cram all their needs into a single system. Either situation would be bad.

My preference is for what I think is a middle ground: the development of APIs around common tasks, in a variety of languages. The idea would be to take protocols like DAS and provide fairly simple library implementations that give you 90% of the needed functionality with 10% of the code complexity (based on the well known 90%/10% rule ;). The key is to make sure the implementations work well enough to do something useful & are in enough languages that e.g. the lone maverick Python/OCaml/Ruby programmer in the sea of Perl & Java programmers wants to play as well (just as one example!).

At the moment there are few tasks generic enough to be encapsulated by such an approach: the two that I can think of are annotation & microarray data presentation. Annotation suffers from a general lack of interoperability: not only does everyone have their own standards, but features don't transfer well between standards. I hear microarray data is the same, although I don't work with it much. It'd be interesting to try to work around the ontology problems (do you *really* want to define an ontology before getting your work done!?) to produce a genuinely useful annotation UI that interoperates. I don't see one out there that's usable by "mere" biologists, and I think that's the right target audience...

Why not use, say, XML? Well, properly grokking XML is burdensome and the whole process is pretty legalistic (lots of people yakking etc.). Since the goal is to lower ease of entry I think it's important to have some functioning libraries as soon as possible -- that way people can get the thrill of having the code actually work. When the library moves towards a standard, projects that are already functioning will at least have some reason to move with that library...

Hats off to the Chinook folks, who are developing a P2P bioinformatics system; you can access the code via CVS, finally.

--titus

Bugs bugs bugs bugs bugs...

Apparently this week is "let's find bugs in Titus's software" week. Didn't know it was formally defined... but three different people have poked holes in three different-but-related projects. The holes range from already-fixed-but-not-in-the-build (FRII), important-but-easy-to-fix (Cartwheel), and important-and-bloody-difficult-to-fix (paircomp). I have to say my users are really great: finding two of these bugs required great attention to detail. Thanks, guys!

The trickiest bug to fix involves finding transitive connections between three two-way comparisons (find all paths A-->B-->C such that for each path A-->B and B-->C and A-->C). I came up with a clever solution that was easy to understand and easy to implement in simple code; unfortunately, it falls apart in the face of reverse complementing. (As you may know, DNA is readable in two directions: AATTGGCC is equivalent to its reverse complement, GGCCAATT (complement: A <--> T, G <--> C).) This problem is compounded by the asinine data structure that I use to represent the matches. Looks like it's time for a serious refactoring...

All of these bugs remind me of this great quote from an interview with Damian Conway:

"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." -- Brian Kernighan, via Damian Conway

I really enjoyed reading this Damian Conway interview on builderau.com. This is a man who has done it all, and has sound advice based on experience. He also gives an excellent reason for using Perl: it's an immensely powerful language that lets you do pretty much whatever you want. (I don't think it's a good idea for inexperienced programmers to use Perl for anything more than short scripts, but -- like Python -- I suspect "short scripts" describes 95% of what is done with Perl ;).

In other news, my OCaml adventures proceed apace. I just finished my very first OCaml program (temp link). dd2.ml implements a simple recursive global-alignment algorithm that finds the optimum gapped alignment between two sequences. Dog slow, but functional (ha ha...)! Now to see if I can add some heuristics into the algorithm to make it speedier.

OCaml is a lot of fun, I must say. At some point I look forward to making use of OCaml's ability to ship cross-platform bytecode around to different machines. It'd be great to be able to add new alignment views and other analyses directly into FamilyRelationsII simply by downloading some new OCaml code! I've also been thinking about how to use OCaml in my tuple space/map-reduce implementation... seems like a good fit!

Last but not least: WSGI. There is now a Web site containing my Quixote and SCGI adapters for the Python WSGI standard. It also turns out I owe Ian Bicking an apology: when I asked why Webware didn't have an adapter, I'd missed Ian's WSGIKit implementation (SVN here, blog here). It's not an adapter so much as a reimplementation effort, as far as I can tell, so I still think there's room for a simple adapter that Just Works (tm). If experiments continue sucking maybe I'll work on that...

ta for now,
--titus

30 Nov 2004 (updated 30 Nov 2004 at 20:46 UTC) »
Stevey -- check out http://www.blogtorrent.com/, it might be what you're looking for. [UPDATE: no, it's not. Never mind.]

In other news, I just updated my QWIP/SWAP README with some simple usage examples, after trying them out with WSGI Utils. (They worked! (sort of)) Stupidly enough I previously posted a dated direct link to the qwip-swap .tar.gz, so I'm waiting 'til I can construct a Real Web Site for QWIP/SWAP to post the slightly updated distro.

--titus

p.s.

'Vegetarian' -- it's an old Indian word meaning 'lousy hunter'.
              -- Red Green
30 Nov 2004 (updated 30 Nov 2004 at 08:04 UTC) »

''' There is a joke about American engineers and French engineers. The American team brings a prototype to the French team. The French team's response is: "Well, it works fine in practice; but how will it hold up in theory?" ''' -- unknown, via Mike Vanier.

OCaml, Python/WSGI, and scalable programming:

Spent some time over the last few days "learning" OCaml, by which I mean reading first the C++/Java programmer's intro to OCaml and then an OCaml tutorial. This is all part of an effort to broaden my horizons: I enjoy using Python and C to solve problems on a daily basis, but I've never learned a functional programming language. Man, is it frustrating to pick up a new language -- I feel completely helpless to even write even the simplest program. This is compounded by my complete inability to think recursively...

I'm looking into OCaml because several different computer-geek friends suggested I try it out. Since all of them profess a love of Python, yet are wiser and more experienced than I in the ways of programming languages (I guess a CS background is useful for something...) I decided to buckle down and study OCaml a bit. So far I've gained an appreciation for the cleverness of OCaml and OCaml programmers, marvelled at 'match', and realized how cool currying is. Not bad for two days ;).

In other news, David Warnock pointed out in his blog that my simple Thanksgiving Day WSGI wrapper for SCGI might be the best-performing WSGI server around, because it's built on top of mod_scgi/SCGI. mod_scgi/SCGI is already fully functional and used for "real" Web sites that run Quixote, and my leetle SWAP code effectively turns this into a full-blown WSGI server. Cool. It seemed too easy to implement, though, so I must be missing some aspect of the WSGI master plan -- why hasn't Webware done this yet, for example?

In connection with that, I've been thinking that an interesting project would be to implement an SCGI server in OCaml. I don't see anything like it out there on the projects page, and it wouldn't take that long to do...

Last but not least, as part of my OCaml adventure, I came across Mike Vanier's rant on the scalability of languages. In it he says, or implies, many things that I wish I could have said more clearly. Things like "The right way to use languages like C is to implement small, focused low-level components of applications written primarily in higher-level languages". Yeah, that.

Mike is one of the three people that suggested I learn OCaml, so I'm a bit saddened by his epilogue in which he turns a little bit away from OCaml (for good reasons, it sounds like, but nonetheless...)

--titus

QOTDE: Things Will Change -- Iain M. Banks, Against a Dark Background (the quote on Gorko's Tomb)

WSGI, Quixote, SCGI, QWIP, and SWAP

In a fit of depression over lousy experimental results, with a healthy serving of turkey on top, I decided to turn my hand to something I do better than experimental molecular biology: program in Python. (Trust me, whatever you think of my programming... my molecular biology is weaker. sigh.)

Pursuant to the general public prodding of various people on the Quixote list, I spent a few hours on the couch today and built two interfaces for WSGI, QWIP and SWAP. (README and source download.)

QWIP, the "Quixote-WSGI interface p(something)", wraps the Quixote publisher in a WSGI-compliant application object. This lets any WSGI-compliant servers out there (are there any?) publish Quixote objects.

SWAP, the "SCGI-WSGI application p(something)", allows the SCGI standalone server interface ('scgi server') to run WSGI-compliant applications. For example, this lets mod_scgi run WSGI applications via the SCGI server -- including QWIP-wrapped applications, which was my testing strategy ;).

Overall, my modicum of experience with the internals of Web servers (mostly from PyWX and some minor hacking on Quixote) served me well; it took me about 1 hr to get QWIP working, and about 3 hours to get SWAP working. (Over half of those three hours was spent figuring out that (1) I was instantiating a new object rather than calling the superconstructor, because I'd left out __init__; and (2) that SCGI expected the input and output streams to be closed to signal that the connection was over. Sigh.) It was pretty satisfying to sit back and set up this set of modules:

Apache <--> mod_scgi <--> SCGI server <--> SWAP <--> QWIP <--> Quixote demo
and have it all work!

I'm now moderately more optimistic about the usefulness of WSGI. I hate (no, loathe) frameworks that attempt to solve the problems of mankind, if you'll just drink this cool-aid sir... But, notwithstanding the philosophical debut in the WSGI PEP, it was pleasant to implement the adapters and I could see WSGI being of significant benefit to Web server authors. Or maybe by buying into the framework I've sold out and you can't trust my opinion ;).

So, kudos to Phillip Eby & I hope this stuff is useful to someone! Now, back to making my Quixote applications do more stuff!

--titus

p.s. Has anyone else noticed that advogato.com and www.advogato.com read cookies differently? Kind of amusing to go to one or the other and have different options available, one as logged-in member & the other as nobody...

9 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!