The Fragile Light
Just finished the book The Fragile Light, by David Nurenberg. Good stuff; independent author. Worth reading.
Briefly, it's a SF&F novel about a world where mutants are sometimes heroes, and more often feared; where there are Herotown ghettoes full of supers; and where only licensed heroes can join in the game. It reminds me a bit of the early Wild Cards books, but better written. A fun read.
--titus
zounds, for running lots of BLASTs
I finally got sick of manually schlepping BLAST files around, so I wrote something to do it for me. 'zounds' is a very simple server/client system for coordinating a bunch of 'worker' nodes through a central server; it does everything in Python with objects and pickling, so it's easy to do extra Python-based processing on the worker nodes. See 'filters' for more info.
You can read a bit more about zounds here:
http://iorich.caltech.edu/~t/zounds/README.html
It's freely available, open-source, etc. etc.
Comments and thoughts welcome; send them to the bip list.
--titus
Serving XML-RPC over HTTPS with Python
We've been talking about how to manage pygr resources remotely via the existing XML-RPC interface, and for that HTTPS is a requirement. I offered to track down the code necessary for running an XML-RPC server over HTTPS. Here's what I found:
It turns out that while the Python stdlib supports HTTPS client
connections (connecting to https:// URLs), it does not directly support
HTTPS serving. To do that, you need to use pyOpenSSL. However, once
that's installed it's a breeze: it's as simple as this,
server = SecureXMLRPCServer(server_address, KEYFILE, CERTFILE)
You can download the SecureXMLRPCServer code and an example here:
http://iorich.caltech.edu/~t/transfer/xmlrpc-https.tar.gz
To run it just install pyOpenSSL ('python-openssl' under Debian),
and then execute 'python serve-ssl.py' in one shell and 'python
test-conn-ssl.py' in another.
Thanks to Laszlo Nagy for his Python Cookbook recipe which only needed a bit of fixing (for Python 2.5) and refactoring (for reusability).
The example .tar.gz above contains a private key and certification so that the code Just Works.
--titus
p.s. Ping me at titus@idyll.org if the .tar.gz file isn't accessible and I'll repost it.
Off to MSU - Woo hoo!
On Thursday, May 15th, I finished my post-doc position at Caltech.
On Friday, May 16th, I officially started as an Assistant Professor split between Computer Science & Engineering and Microbiology & Molecular Genetics at Michigan State University.
On Friday evening and Saturday, we hung out down at the Caltech Marine Lab and partied.
As I type, I'm on a plane flying from California to Michigan, where my cat and I will spend our first night in our new house.
My wife and daughter will join me on Wednesday.
All of our stuff is en route and will arrive later in the week.
On Monday, I will start "directing" my new lab: I already have several local summer students, as well as a part-time research assistant. Two graduate students and a postdoctoral fellow will be starting with me later in the summer.
Hooray!
--titus
E-mail is getting *really* unreliable
I've been hit by a few different e-mail-related problems over the last few months, and it's becoming intensely frustrating.
Some servers seem to randomly drop messages from me, for no obvious reason; at least, people don't get one message and then do get another, a day later. gmail may be at the center of this but it's not clear. (Changing my public From: address to ctb@msu.edu may have contributed to the lossage rate, but cannot be the only cause.)
Yahoo at first decided to mark messages as spam, then drop messages, and now likes me again.
As I write, caltech.edu seems to be bitch-slapping a select subset of my messages. Either they're grey-listing and waiting to forward some messages on, or they've dropped some messages entirely.
My daily e-mail load has grown to the point where I appear to be hand-deleting real messages from my inbox when my eye gets them confused with spam or automated notices. I know of at least two messages that I've deleted in the last month -- I found them in my received-not-spam folder (where I save all incoming messages) but have no recollection of having read them. Before this month, I can't recall having accidentally deleted two e-mails in the last 5 years. And it's not like my spam filter has a high false negative rate: I barely get any spam at all in my inbox.
Now I'm trying to start sending messages to people in my lab, and some of them are not responding or acknowledging the messages. Maybe they're not getting them. Maybe they're out to lunch. Maybe they don't like me, or authority, or something. I don't want to hassle people who don't want to respond, but I also want to make sure they got the $!%#$#$ message! Ah well, I will be physically there soon...
All of this goes to say that for a variety of reasons -- increasing amounts of e-mail, increasing amount of SPAM e-mail, increasingly random and annoying anti-spam measures implemented by the big inbox providers -- e-mail has become unreliable enough that I have to think about it. Even worse, the number and variety of anti-spam measures in play mean that neither I nor the receiver may have any ability to affect the spam filter that is dropping e-mail (translation: I don't know what to do to make things more reliable).
Grr.
I think the time is coming where a reputable SMTP forwarder could make some $$; I'd be willing to pay a $ or $$ for a bonded SMTP provider! Anything like that out there that actually works?
--titus
p.s. In a largely unrelated side note, the number of blog comment spammers attempting to post to my blog continues to hit record highs on a daily basis. I've never approved a spam comment -- yet they continue to try. It's mind-bogglingly stupid and it just goes to show that stupid behavior will continue indefinitely if it's approximately zero-cost to the commit-ee. Grr x 2.
So, err, drop me a line if you wrote a witty comment that didn't get posted, and if I don't accidentally delete your e-mail I will approve your comment.
pygr gets some summer love
(pygr is a neat bioinformatics framework in Python.)
After some commenters on my last post seemed happy to hear that pygr was the focus of some summer work, I realized I had only discussed the pygr summer work in a post to the biology-in-python list.
Whoops.
So, here's the scoop: not only is pygr the focus of Rachel McCreary's Google Summer of Code project, but Jenny Qian will be using pygr to build an ENSEMBL interface, also as part of the Google Summer of Code.
That's not all!
In addition to Rachel and Jenny (under the sterling mentorship of Chris Lee, Robert Kirkpatrick, Namshin Kim, and myself) I have two MSU students working with me over the summer, Alex Nolley and Marie Buckner. They'll both be working with pygr-related things, although like Jenny their efforts may end up being more on ways to use pygr than on pygr's code itself.
I also have a grad student or two that may drop in on pygr, if only to use it for something research-y.
So all in all, pygr will get a lot of love this summer. Hopefully we can polish the code and documentation and tutorials to the point where the learning curve is as minimal as it can get, and this fabulous package will become readily available to many others...
Why am I personally putting so much effort into pygr? Well, I've been using it more and more over the last few months, and (somewhat like scipy) it's transformed my work by turning annoyingly difficult data organization problems into trivial Python transformations. I can literally throw together a custom genome browser in a matter of hours -- I've implemented two or three already, for different projects -- and it has enabled several new research program. pygr seems to be one of those rare packages (kind of like Python itself) that is not only functional and effective but presents a unified and coherent intellectual interface. pygr is the only good middleware layer I've seen for sequence intertwingling in bioinformatics. It's not that mature yet, but it has serious promise, and I'm hoping to get in on the ground floor, so to speak :).
cheers,
--titus
Dear Lazyweb: JavaScript "imagemaps" and/or image subselection?
Dear Lazyweb, help!
I'm embarking on a number of summer projects in my new lab at MSU, and several of them focus on using pygr to do cool genomic stuff. In particular, I'm planning to build a personal genome annotation system that will let people run their own full genome Web sites and annotate the genomes with private information such as Solexa data, cDNA/EST projects, ChIP-seq, cis-regulatory reporter constructs, ncRNA predictions, etc. etc. (If you're interested in this sort of thing, get in touch -- it will, of course, be open source and open development, albeit in Python :)
As I've been thinking more about how to do the display side of things, I've been running headfirst into a serious lack of knowledge. I would like to make an interface that looks somewhat like your standard genome browser/GMOD/UCSC interface, such as this UCSC view of the chicken genome. I already have the basics of that view working; for example, see this simple example and a group-feature example. But I'd like to add more - a LOT more -- interactivity.
Ideally I'd like to be able to draw simple objects (squares, rectangles, lines) on some sort of canvas and then use JavaScript and AJAX to pop up windows and display bits of information. But I don't really know this space of functionality very well.
So I'm turning to the lazyweb.
Are JavaScript+image maps the right way to go (for example, this, this, and this)? Do they work well with multiple browsers? Or are there good JS libraries for drawing images on the fly in the browser? Is SVG a good thing to look at? Were you stuck with this task, what would you use?
The most important things for this project are, in order of importance:
- basic functionality (JS image maps seem fine for this)
- cross-browser functionality
- selection (e.g. GMOD RubberBandSelection)
- flexibility: reordering and redrawing of images.
Your thoughts are much appreciated! Please drop me a line or comment, whichever is most convenient. I'll summarize the options.
thanks,
--titus
p.s. I'm perfectly fine with "Google this, dumby!" I just don't have much in the way of google keyword knowledge in this area...
Eating your own dogfood (but only eating half the bowl)
So I'm pretty bullish on testing for maintenance reasons. It was nice to see how well it worked out for me when a user recently reported a problem with Cartwheel.
This is what happened: third-party package (LAGAN) that the user was running through the Web interface depended on certain command-line behavior from 'sort'. Now, I wasn't aware the the command-line arguments to sort were still evolving, but apparently they are -- my latest Debian upgrade removed some options (the '+1' behavior) in favor of '-k 1'. In any case, I did this big upgrade of many packages, and didn't realize that this third-party program was now broken. (More on that later.)
The user reported weird results, so I went and verified that he'd set everything up properly and that this was in fact a real problem. Then I ran the Cartwheel automated test suite. Voila! Problem was instantly pinpointed in a reproducible manner.
I fixed the program (editing Perl, ick), re-ran the tests, and then re-ran the user's analyses. Tada, done.
OK, so, great, the tests pinpointed the error for me after the user had found it.
Why did I have to wait for a user to report it?
Because I wasn't running the tests under continuous integration on my compute server.
Why not?
Can't think of why.
What would you have done differently?
I would have made sure all my tests were passing on my compute server after I upgraded the thing, i.e. not been a schmuck.
What have we learned?
Tests are only useful if (first) you write them -- that's half the battle -- and (second) you run them. Oops.
More generally, it was fun to note that by putting a fairly high-level functional test on the batch-processing backend, I discovered a bug several levels down in my software stack -- a problem lying between a third-party package and a system utility. Unit tests wouldn't have found this bug, unless the third-party package had them (don't think so) and I was running the third-party package unit tests (good grief...)
OK, back to work.
--titus
John Ringo is a caricature of a wingnut
I read a lot of total crap, and one of my recurring crap authors has been John Ringo. He's a total nutjob politically, but he writes good battle scenes and is an enjoyable read once you cut through the nonsense. Still, I'm having a tough time getting through the opening chapters of The Last Centurion. In this book, Ringo constructs a near-future world where Hillary Clinton is president, global cooling is the problem, and the chemicals from processed food and big farming are life saving.
Let's take those one at a time.
One of Ringo's favorite tropes is that the left, and the Clintons especially, are what's wrong with America. It's hard to convey the dripping scorn with which he discusses these topics, but it involves a lot of naughty words. In this book, Hillary Clinton (or a straw woman facsimile thereof) is president through the Big Chill and the simultaneous deadly bird flu outbreak, and she makes every mistake possible. While Hillary Clinton is not my favorite politician, it's worth noting that our current president (who can do little wrong in Ringo's eyes) has actually made almost every mistake possible, and this makes Ringo's text unbearably difficult to read. If Ringo is hoping to even tell a good story, much less sway anyone's opinion, he'd be better off with less in the way of textual histrionics.
Another one of Ringo's tropes is that the global warming hypothesis is nonsense. Not only does he mention this frequently, but he literally pauses in the middle of his books to deliver four page diatribes on the subject. In this latest book, Ringo makes the next big climate change event a major solar COOLING, which has predictable effects on the food supply. Now, I'm a scientist and a lefty, and I've even worked on science relevant to climate change, so presumably (by Ringo's criteria) I am unfit to comment, being moderately knowledgeable. But when your social commentary depends entirely on fiction, it loses any relevance and becomes a distraction.
The most interesting novelty in this book (which presumably will become another abortive series, to join the ranks of his other five unfinished series?) is the device where American lives are saved by having eaten so many processed foods. As far as I can tell, the idea is that eating processed foods conveys resistance to chicken flu, and this leads to a dramatically greater survival rate in America. I'm not sure why this device is in the book, unless it's another imaginary nail in Ringo's imaginary coffin of liberalism. Whyever it's there, it's entertainingly stupid -- there's plenty of evidence that weird, random chemicals do weird, random things to your DNA, and that's one reason why cancer is so prevalent. There's no reason at all to believe that these chemicals would somehow "cancel out" bird flu. But what do I know? I'm just a molecular freakin' biologist...
Combine all that with Ringo's inimitable writing style in which no breasts are too big, no hero goes unfucked by multiple (large-breasted) women, and no terrorist goes unpunished, and these books are truly a piece of work. I do not, however, mean "of art". In fact, this last book is so outlandish that I'm actually becoming a bit suspicious of Ringo's sincerity. It's hard to read such complete and utter crap without thinking that perhaps the author is secretly making fun of the very viewpoints he is espousing. But it's been a consistent trend towards lunacy thus far, so I'm inclined to believe that he's actually somewhat sincere.
Anyway, here's my judgement: Ringo's latest book is masturbatory fodder for hard right wingers, and it's becoming increasingly difficult to enjoy his books if you're not actually lobotomized. Luckily that ensures him an 18% market.
--titus
Threading and subprocess
I'm having a long-running discussion with some people about threading and why using threads with simple subprocess calls is almost certainly an overcomplicated (== BAD) use of threads. Everyone seems to think I'm wrong (at least, there's either deafening silence or straight out argument ;) and I think I finally figured out why.
The task at hand: use subprocess to run some command (say, 'ping') a bunch of times. Because the command is I/O bound, you want to run the commands in parallel. Should you use threads to do this? Is it necessary in order to achieve good performance?
Well, consider these two examples ('common.py' is down at the bottom; it just contains the list of IP addresses to ping, and a function to call subprocess.Popen).
nothread.py:
from common import IP_LIST, do_ping z = [] for i in range(0, len(IP_LIST)): p = do_ping(IP_LIST[i]) z.append(p) for p in z: p.wait()
thread.py:
import threading from common import IP_LIST, do_ping def run_do_ping(addr): p = do_ping(addr) p.wait() ### # start all threads z = [] for i in range(0, len(IP_LIST)): t = threading.Thread(target=run_do_ping, args=(IP_LIST[i],)) t.start() z.append(t) # wait for all threads to finish for t in z: t.join()
Both of these work fine, and in both cases are easily modifiable to retrieve the output, exit status, etc. of the ping command. (In the threaded example you have to keep track of 'p' in 'run_do_ping' to retrieve this kind of info, and I wanted to keep things as simple as possible.)
They also run in about the same amount of time, although the non-threaded one is quicker by a few milliseconds for me. I think this is because thread starts & joins are extra overhead.
The key misunderstanding in the discussion seems to have been that the examples at hand were using subprocess.call, which blocks waiting for the subprocess to exit, i.e. equivalent to using this code in nothread.py:
for i in range(0, len(IP_LIST)): p = do_ping(IP_LIST[i]) p.wait()
Here the pings would execute serially rather than in parallel, with the obvious performance problem :). However, you can bypass this effect of subprocess.call by using subprocess.Popen, which creates a new process that executes in parallel with the calling process.
So, for this simple use of subprocess -- running a shell command and gathering the output -- which is "better"? I think 'nothread.py' is better because it is simpler, shorter, clearer, and less complicated. Of course, as soon as you start doing more complicated stuff like reading the streams of information coming out of the subprocess commands, the threaded version may well have its advantages. But that's not the case here, I think.
Comments welcome.
--titus
common.py:
import subprocess
IP_LIST = [ '131.215.17.3',
'131.215.17.4',
'131.215.17.5',
'131.215.17.16',
'131.215.17.17',
'131.215.17.18',
'131.215.17.19',
'131.215.17.24',
'131.215.17.25',
'131.215.17.31']
cmd_stub = 'ping -c 5 %s'
def do_ping(addr):
cmd = cmd_stub % (addr,)
return subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE)
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!