Older blog entries for titus (starting at number 69)

Safari --> Firefox --> Saf... no, Camino

Camino latest is blazingly fast, doesn't sit & spin, has no focus issues, and is very pwetty. Highly recommended.

Switching between three browsers is a bitch. Most of my bookmarks are on private 'links' page, but I didn't realize how much I depended on saved user/pass info. Ever since I settled on my iBook as my portal to, well, everything -- an 80x40 terminal window running screen, Web browser, and X server satisfy roughly 99% of my needs -- I've been letting my browser remember my login info. Now I need to have sites send the passwords to me, and in some cases they randomize 'em. Argh. In one extreme case -- my local library -- I'm going to have to go visit the library to get a new password.

C++ sweetness

Proper use of C++ sure is nice & clean. I'd still prefer the cleanliness of try/finally, but yesterday's global interpreter lock class is useful enough that it should be put somewhere for other people to find. I wonder if I could convince GvR etc. to put it in the (currently very short) writing extensions in C++ docs? I couldn't find a place in the Python cookbook for what is, technically, a C++ recipe...

Thanks to Chris Frey, Peter Hart, and Max Caceres for their help on this!

My C++ code is beginning to resembly my Python code. (It's still uglier, of course ;). By and large I can do lots of stuff in small amounts of code, and any real ugliness can be hidden in short, easily-tested functions in the implementation file.

Anyway, I'm at a 1.0rc1 release for paircomp, now that I've got error reporting working. I've also made khmer 0.2 available. It's a simple, fast k-mer counting program for whole-genome k-mer statistics. I'm always surprised at how fast you can do the simple stuff: khmer can count all 12 bp words in a 5mb genome in less than a second. Now to try it out on human (600 times larger)... ;)


13 Mar 2005 (updated 13 Mar 2005 at 22:58 UTC) »
Safari --> FireFox and back again

I tried out Firefox on my iBook, because Safari was spinning the little wheel too much. Firefox isn't much better, and has a number of focus problems. Plus it doesn't look nearly as pretty. YMMV.

I'll try out Camino when it hits 0.9...

Fun with Python segfaults

Can anyone spot what's wrong with this C++/Python wrapper code?

PyObject * ret;


try { long val = heavy_computation_stuff(); ret = PyInteger_FromLong(val); } catch (program_exception & e) { PyErr_SetString(PyExc_Exception, "whoops, I got broke"); }


return ret;

(It segfaults.)

I'll give you a hint: it has to do with the global interpreter lock.

Oh, and there are actually two bugs in the code, but only one

actually causes a crash.


Well, I'm sure you're on tenterhooks now, so I'll give you the answer to the guaranteed segfault: you need to wrap PyErr_SetString in Py_BLOCK_THREADS/Py_UNBLOCK_THREADS.


OK, well, the other bug is the same thing, it just doesn't cause problems in this particularl instance: PyInteger_FromLong also needs to be wrapped in Py_BLOCK_THREADS/Py_UNBLOCK_THREADS.

I forgot the cardinal rule of the GIL: any time you access Python code, you need to turn threads off.

ARRRGGGH, I swear this took me the better part of a day to figure out.

But now I'm stuck. Without try/finally in C++ I can't guarantee cleanup if I do an ALLOW_THREADS in the try block. And it'd be severely ugly (not to mention moderately error-prone) to set a flag when an exception is raised, e.g.

try {
} catch (...) {
   exception_raised = true;
if (exception_raised) { END_ALLOW_THREADS; }

(Yeah, I'd need to redefine the macros to make this work anyway.)


Chris Frey and Peter Hart pointed out that you can get the same functionality as try/finally by using classes. So my solution now looks like this:

try {
   { py_thread_saver save;
   val = long_computation();
   ret = PyInt_FromLong(val);
} except (...) {

Since the py_thread_save object is an automatic variable, it gets destroyed at the end of the code block.

The py_thread_saver class is pretty simple:

// a class to automatically handle saving of thread state. class py_thread_saver { protected: PyThreadState * _tstate; public: py_thread_saver() { _tstate = PyEval_SaveThread(); } ~py_thread_saver() { PyEval_RestoreThread(_tstate); } };

there are a bunch of good variations on this that can fit more complicated scenarios, but this solves my problems perfectly. Thanks, guys!


Publications and open source

My paper on FamilyRelations etc. has finally been reviewed; both reviewers liked it, although one requested some clarifications before final acceptance. I'm fixing it today and will send it in within the week; assuming I don't screw up the revisions, it'll be out by Apr 1st. No mention of the "publicly available software isn't original any more" nonsense of BioTechniques.

Another paper, Anaerobic regulation by an atypical Arc system in Shewanella oneidensis, was accepted last week. This was work done along the lines of my earlier paper on finding binding sites in microbes. In this case, someone in a local lab tested the results of a different search and showed that my search was moderately predictive of function. (online materials here)

A multitude of motifs

Spent some time working out some simple math on motif finding. Will talk about it in more depth when I have the energy ;).

British books, again

Alistair Reynolds has a new book "Century Rain" (ref). Jon Courtenay Grimwood has a new book "Stamping Butterflies" (ref). Richard Morgan has a new book, "Woken Furies" (ref). Iain Banks, John Meaney, and Steven Erikson all have new books out, too. What do all of these people have in common? They're published in England, so I can't get them without paying $exorbitantly$ for shipping.

I may see if Munro's Books can special order any of these and then tranship them to me in the US. Hefty price tag, tho, to buy that many books at once. Sigh.


Meeting up early

Had our SoCal python interest group meeting last night; 7 people showed up. Very interesting! Meeting in person is higher bandwidth than talking online ;).

Grig gave his PyCon talk on Agile Testing methodologies. Very thorough presentation; lots of good software out there. Someone should give his PyUnitPerf software a try already! I won't give you the punchline of his talk, you should go to PyCon (or read his blog)...

I gave a short demo of my supercalifragilisticexpialidotious side project on annotating URIs. Natch, my laptop died just as I was setting up. 1st time in months. What is it about demos!? Anyway, good thing that it was mostly a Web demo, so I could swipe someone else's computer, and the crash prevented me from showing my PowerPoint. Probably a good thing there, too. People were very nice and enthusiastic about the possibilities.

The pizza was good, too.

Resolved: we will advertise more widely. We will do some intro talks (people wanted to hear about our experiences with Quixote, I wanted to hear about Greg McClure'sexperiences with CherryPy, Daniel Arbuckle was queried re metaclasses). We may meet at the dank & deserted marine lab again. We discussed a couple of ideas for community participation in Python stuff, too. More anon.


RIP: Hans Bethe

July 2nd, 1906 to March 6th, 2005.

Hans Bethe died last night at dinner, at the age of 98. He was one of the 20th century's greatest physicists; among his other accomplishments he received the Nobel Prize for describing the H --> He conversion that fueled our sun. Most physicists were probably surprised to learn that he was still alive; he was literally responsible for laying much of the groundwork in atomic and nuclear physics in the 1930s, and contributed immensely to many different areas of physics throughout the century.

He also collaborated closely with my father for almost 30 years. Some of their work is still moderately controversial (e.g. low mass black holes). He was hoping to live to see LIGO confirm some of their latest theories on neutron-star binary mergers, but that was not to be.

For many years (~1985-2000) he and my father travelled out to California to work at Caltech for a month each January. I got to know him a bit during those months, because he and often his wife Rose would stay with my father in the same apartment. He was always very mentally active, even as his physical abilities declined over the years. It was always tricky doing things like picking him up at the airport, because you wanted to be careful with this living legend! I knew that if I had an accident with him in the car, I'd be infamous throughout physics...

My friend Chris Adami has written a book called "Three Weeks with Hans Bethe and Gerry Brown", describing a short period in 1992 that Chris spent with Hans and my father. It captures Hans' intellectual depth and conversational style perfectly. I hope it will be published soon.

Hans is one of two or three people directly responsible for my entry into biology. He told me that when young scientists asked him what field he would go into were he starting in science now, he would emphatically respond "Biology!" He believed that biology would be the field with the next big achievements, and -- as always -- he was right.

I will miss him.


p.s. Wikipedia, as usual, is up to date...


Wrote another toy WSGI application tonight: wsgiFeedSuck.py. It wraps RSS feeds with a simple WSGI app that displays the titles and summaries.

You can try it out, for the nonce, in CGI mode: here.

I wrote wsgiFeedSuck to learn how to use Mark Pilgrim's excellent feedparser module. Astute examiners of the code will note that I use 'etag', 'modified', AND only check the feed every hour. Yay me ;).

The only real problem I have with the code is the lack of file locking around the shelving. O well. Suggestions welcome.


OK, I'm posting it. Happy? Damned voices, yammering away in my head...

(I do like the fact that Web-based darcs repositories are also Web sites in their own right. Very convenient when you don't want to do any work to "release" something.)


oubiwann, congrats on remembering to X out your password. I didn't, the first time I posted such a script ;).

chromatic, nice enthusiastic article on PostgreSQL. (I especially like the Oracle user's comment at the bottom: "but in our nice shiny expensive database, we've been using this for eons...")

avrietta, I couldn't agree more. At least about the steak. And maybe the single malts. But seriously, these jokers are running Wikipedia on non-ACID databases!? Whoo. You should beat up on me anyway, though, I like my tri-tip marinated. (I can't afford better cuts of meat.) But I do sear it. A few BBQs ago, I let a German cook the meat -- he kept on telling me it wasn't done, until finally I realized he was "searing" it all the way through. He'd already ruined it by then, but luckily he tasted good with the cajun BBQ sauce I like, so I didn't go hungry for long.

robocoder, your wedding site is down. You should use a co-loc. ;) Ummmm and you should also keep it updated...


Generating all L-tuples from a given alphabet

I had a muuuuuch longer diary entry written, but then I realized it was all shite and needed rewriting. Meanwhile, here's something that I thought was cute:

alphabet = ('A', 'C', 'G', 'T')

def rN(L, *args): """ Generate all L-tuples from the given alphabet. """ if L == 0: print args return

for letter in alphabet: rN(L-1, letter, *args)

For example, rN(2) gives this:

('A', 'A')
('C', 'A')
('G', 'A')
('T', 'A')
('A', 'C')
('C', 'C')
('G', 'C')
('T', 'C')
('A', 'G')
('C', 'G')
('G', 'G')
('T', 'G')
('A', 'T')
('C', 'T')
('G', 'T')
('T', 'T')

There are just *all sorts* of tricks you can do with Python's function passing semantics, ehh?

Rant of the Day

Why must people post sub-200k binaries via SourceForge's incredibly crufty load-balancing download system?!? It makes them difficult to download, IMO...


Selenium redux

When I asked about the Selenium web testing framework, the only reply I got was from one of the authors, Jason Huggins. He says the software works really well -- now I have to decide whether or not to believe him ;). He says it's getting some use in Plone-world, though, and Grig is going to try it out & (hopefully) let me know.

Bret Pettichord wrote a nice entry on Selenium, too, so maybe I should believe Jason...

Guido takes names

Rewriting Python from scratch: bad.

Language adoption by Apple

Paul Snively writes a longish e-mail to the OCaml mailing list about Apple's history with language research. Interesting stuff.

The Dial-Up Divide

Adam Rifkin writes on weblications in his Deeply Intertwingled blog. One thing that caught my eye was his description of Google Suggest's bandwidth usage: every time you hit a key, approx ~1 kb of data is transmitted from the Google server to you. I think this nicely illustrates the divide between dial-up and broadband: how well do you think this will work on dialup? ;)

Titus Is Testing Unicode Scriptmanagement

ObGoogle: found this. In the German, "Thesaurus Indogermanischer Text- und Sprachmaterialien". (No, that's not a real translation; it means "Thesaurus of Indogermanic Texts and Language Materials", or somesuch.)

Today's Resolution

I will not get spittingly furious at the incompetent, obtuse, inane, idiotic, dopey, lamebrained, thick, asinine, boorish, witless, half-baked, feeble-minded or doltish managerial practices of professors. At least not more than once a day.



Grig Gheorghiu pointed out Selenium, a Web testing framework. Does anyone have any experience with this? Please let me know... - thanks!


Ankh, I'm not sure what to make of your comment about turning on heels... ;) I've read parts of the Gormenghast trilogy, and I love the atmosphere that Mervyn Peake creates. What can I say, titus is my real name!

As far as whither XML... My Cartwheel/FamilyRelations software can communicate between client & server using an XMLish data format, and I recently extended it to save/load this format from disk. This way people can load analyses offline. The only real problem I ran into was that the files can contain large analyses & I ended up doing a Bad Thing and encapsulating the analyses as blobs within the XML data. Not sure what I else I could have done; using XML properly would have made the files 50 times larger.

I spent some time writing Java, Python, and C++ parsers for the format, but then realized that until I made the system more generally useful no one was going to care but me. So, I adhered to Rule Y and just built the libraries without writing a detailed spec. So no DTD, no spec, just an internal feature format that could be regularized were anyone interested. Which they're not ;).

I've spent more time on the RPC route in recent years, and am now stuck on the fence about the future. My next set of features could depend on XML-RPC to communicate data, but then the data can't be accessed without a server connection. This seems like a bad idea, but the overhead of writing the XML I/O functions also seems unnecessary at the moment.

But I'm sure that's more than you wanted to know...


Random Miscellany

Undefeatable spam. (Well, at least without real AI.)

Public posting of genome shotgun data can lead to the discovery of new species, sequenced unintentionally. Heh.

CatchUp! Record the refactoring of your API, then "play back" the refactoring on dependent applications. Very cool!

WaterBot. I don't get it.

Corollaries and Laws

My first rule of thumb (a.k.a. "law that I will force into every technical discussion, irrespective of relevance") is this:

Rule Z: A marshalling library isn't complete unless A = load(marshal(A))
A corollary is this:
Corollary Z_1: An analysis program isn't usable by other people unless it is possible to build a complete marshalling library for the output.
This is one of my bet bioinformatics peeves: people (or large institutions) who build otherwise useful programs that have an utterly nonregular output format, rendering these programs largely useless for pipelines.

Corollary Z_2: A list of tab-delimited lists is not a real file format.

'nuff said there. One day I hope to be inspired enough to write a rant about this; my title is already planned out: "You wouldn't use Excel as your database, so why are you using GFF?" (GFF is a simple tab-delimited format that everyone uses in bioinformatics.)

Another rule of thumb:

Rule Y: Premature standardization is the root of much evil.

Writing data export specs before you've talked about use cases with at least three other groups is one flagrant contravention. Solidifying APIs without actually developing a real application that uses them is another.

Prevayling Stochasticity

Via Max Ischenko, the Pyrasun rant about Prevayler is pretty wild. Mike Spille seems to have some serious objections to Prevayler, and after reading the interview with Klaus Wuestefeld I get a bad feeling about it, too. It's not like this guy Klaus needs any more publicity -- even (or especially?) bad publicity must feed his ego -- but there are some choice quotes in the interview:

Q. "Who are you?"
A. [ ... ] "That is, I'm an nonconformist. [sic]"
Q. "What would you say to those people who own this kind of application [SQL app], and want to migrate to OO using Prevayler?
A. " Initially, they will feel like Neo, floating on that water container, completely atrophic, for living their entire life inside a database bubble."
Q. "Have you ever been confined in an asylum?"
A. "In reality, I'm just a prophet."

Wow. Well, if attitude and arrogance equalled credibility, this guy would be even more credible than me.

60 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!