Older blog entries for titus (starting at number 237)

30 Jan 2007 »

Read This Book

I'm midway through Scott Rosenberg's Dreaming in Code, and I can unabashedly recommend it to anyone who likes a good yarn. Yes, it's about software development, and you'll need a fair bit of technical exposure -- not experience, just exposure -- to navigate the references. But anyone who is reading this, including my not-so-technical friends, should be able to understand it, enjoy it, and even learn from it.

The book covers the start of the Chandler project, and does a fantastic job of describing both the social and technical aspects of a large, (over)ambitious software development effort. Rosenberg has done an excellent job of setting up the problems inherent in software projects, and his description of the contesting software development paradigms -- cathedrals, bazaars, mosh pits, etc. -- is well written and quite comprehensible. While I haven't finished the book yet, I know where the project is today, and I know that I will be disappointed at the ending of the book -- yet after reading the first half of this book, I would be shocked were it to turn out any other way: the whiff of doom is already quite palpable.

My background in software goes back a fair bit. I started really programming in high school in the late '80s, when I learned C and roughed out a simple Nethack clone on a PC. Soon thereafter I got a UNIX account from Mark Galassi, wrote a 'talk' clone called 'ring', and participated in 'dominion' development. This led inexorably to a variety of projects, including some of the first Avida digital life software, several conference organization systems that took me through Perl/CGI/hashes, Tcl/Oracle, and ultimately got me into Python/PostgreSQL around 2001. After that I bounced around, working for a company or two, and slowly getting into better and more serious development practices. Now I develop and maintain a dismaying variety of projects: Cartwheel/FamilyRelationsII for bioinformatics, twill/scotch/figleaf for (Web) testing, and a ton of cute little one-off projects for research and testing and general open-source mayhem.

These experiences with software development are why I enthusiastically and wholeheartedly recommend this book. This world -- the world of Python, Open Source, desktop and Web programming -- is a world I visit on a regular basis. Rosenberg's descriptions of the projects, the people, the technical decisions, the thought processes, and above all the social component of software development are spot on.

Unless the book's quality takes a dramatic downturn, I'm seriously thinking of trying to use it as the cornerstone of a software testing course at MSU. The problems encountered by the developers of Chandler, and the narrative that Rosenberg builds around them, could be used to neatly demonstrate step by step just how much a "test-driven" development technique can buy you. In fact, if there's one thing that I'm puzzled by, it's just how the Chandler team could screw up so badly in 2002. These are intelligent, educated people who are up on the latest software development practices; where were the acceptance tests, for example? While I acknowledge that this could be a blind spot in the author's book, or simply blind faith in my test-infected opinions, I have a hard time understanding a process that focused on designing for abstract feature sets without any hard customer-facing tests. Why were they building prototypes without encoding the lessons learned in acceptance tests? Heck, why has the word "test" yet to be mentioned seriously?

Yes, we all see the books we read through the lens of our own biases, our own experiences, and our own inclinations. And yes, my enjoyment of this book is probably partly motivated by the belief that I could have done better -- and, honestly, that's a pretty hubristic belief. But regardless of whether you have the same reaction, it's a ripping good tale, and it's definitely one of the best books on software engineering that I've read lately.

Read it. You'll enjoy it.

--titus

p.s. Yes, Grig, you can borrow my copy.

Syndicated 2007-01-30 18:52:09 from Titus Brown

30 Jan 2007 »

Quixote Update

I spent an hour or so coding on Quixote, in response to Neil Schemenauer's call for community help. Result: automated twill tests, WSGI, and qpy "integration". It helped that this was all stuff Mike Orr and I already had sitting on the shelf ;).

--titus

Syndicated 2007-01-30 09:03:07 from Titus Brown

29 Jan 2007 »

The Google Hiring Filter?

While discussing yet another Google interview blog post with a friend, I formalized a suspicion I've had about Google's interview process. You see, it's puzzled me a bit that Google insists on a fairly reasonable knowledge of algorithms, because, in my own experience, that is a fairly small part of actual software development. I think "writes maintainable code" and "writes automated tests" and "understands basic software architecture principles" are all far more important criteria for someone who will actually be developing. Since (again in my own experience) people who focus on algorithmic considerations are often rather inexperienced at producing actual functioning software, it struck me as weird that Google would go for this kind of person.

So what is going on?

I think Google is using this as an over-selective filter for intelligence. Sure, they can't get all the smart people this way, but they're virtually guaranteed to get smart people if they insist that they grok algorithms fairly well.

Anyway, I've never interviewed with Google, and I don't plan on it in the future, but for those of you who do want to work there, I'd suggest brushing up on your big-O stuff. It's the one constant that I've seen emerge from Google Interview Stories.

--titus

Syndicated 2007-01-29 03:03:07 from Titus Brown

26 Jan 2007 »

MSU Position

Just a short note to say that I've taken a faculty position at Michigan State University, in Lansing, Michigan. The position is split 65%/35% between the Computer Science and Microbiology & Molecular Genetics departments, and I expect to be working on a fairly wide range of problems. My computational "focus" (such as it is) will be on applying effective computational techniques to biological data; I also will be doing experimental (wet-bench) research in vertebrate developmental gene regulatory networks.

One of the big changes that the MSU Computer Science department is contemplating is changing their intro CSE classes over to some mix of Python and C++. I'll probably be helping with that. Also, I think one of the reasons they hired me to is to introduce more open source and "agile" testing technology into the CSE curriculum there, so expect to see some posts about that soon.

I'll be starting at MSU in August 2008, a year and a half from now.

If you're interested in getting a Masters or PhD in computer science, and like the look of MSU's CSE department please contact me. I'd very much like to attract open source/Python/Web people to the department, and you wouldn't necessarily have to work with me -- there's plenty of other people there. I'm not so familiar with other people yet, but my friend Charles Ofria runs the Digital Evolution Lab, which does fantastic research.

(And, obviously, if you're interested in gene regulatory networks, developmental biology, regulatory genomics, and bioinformatics in general then we should talk...)

--professor titus

Syndicated 2007-01-26 19:03:07 from Titus Brown

22 Jan 2007 »

FLTK on Windows, and using 'CMake'

I spent part of the last few days figuring out how to get my cross-platform GUI, FamilyRelations II (a.k.a. FRII), to build on Windows. It used to compile on my old desktop machine, which we then converted over to MythTV with a Windows partition for compilation, but that hard drive died during my thesis struggle. The trick was (and is) to get the software built with the proper combination of libraries/library dependencies so that in the end the linking will work.

The big struggle this time turned out to be FLTK. FLTK is the cross-platform GUI system I use; it's a fast, light ("FL") toolkit for windowing, and it works very nicely on OS X, Windows, and Linux/X11. I could compile it just fine, but it turned out to be compiling with the 'mingw' compiler instead of the standard cygwin gcc compiler, and this meant that I couldn't link it with the rest of the stuff I was compiling -- in particular, the Xerces-C XML parser. In the end I had to go hack on FLTK post-configure/pre-compile, which was ... fun.

The bright spot in all of this mess was that I converted my entire C++ source tree -- 24k lines of C++ code -- over to using CMake. CMake is a build configuration system that I hadn't noticed before last month, but it's actually used fairly widely; the KDE project uses it, as do VTK and ITK. A local developer working with my code (hi Diane!) had actually spent some time converting a subdirectory over to use cmake, too.

CMake is a complete replacement for configure, as far as I can tell. I don't know if it covers the weird corner cases in less-used OSes, but it seems to work pretty well for my simple need to build binaries on Linux, OS X, and Windows.

The configuration and make steps are now as simple as this:

cmake .
make

Here, CMake configures the entire source tree, locating libraries & other dependencies, and then creates makefiles suitable for 'make'. (It can also create Visual Studio build files; so far no indication of XCode/ProjectWhatever stuff for OS X.)

Most impressively, the configuration files I need to write are actually quite small and easy to write, with essentially no boilerplate; I converted the entire source tree over to CMake in about an hour. Your standard {{{CMakeLists.txt}}} file (ugly name!) looks something like this:

Project(hello)

include_directories(./include)
link_directories(./lib)
link_libraries(mylib)

add_executable(hello hello1.cc hello2.cc)

This will generate essentially this command line on Linux:

gcc -I./include -L./lib -lmylib -o hello hello1.cc hello2.cc

(simplified for dramatic purposes).

If there's one frustration I have with CMake, it's that the documentation is not well rounded. It took me forever to find the correct incantation to add '-mwindows' to all builds on windows:

if(WIN32)
   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mwindows")
endif(WIN32)

Mind you, it's pretty terse & to the point when you do figure out how to do things, but getting there can require a lot of grepping.

Now for the somewhat odd bit: I actually met a (the?) maintainer of CMake, Andy, at a recent image analysis conference, the NA-MIC AHM in Utah. We were there to talk with Kitware about some image analysis software development, and it turns out that they maintain CMake and use it in a lot of their software. Since my local friend had been pushing it on me already, I tried it out while at the conference and found it to be pretty easy to use, and that spurred the rest of my conversion.

It seems odd to have met these people in person before corresponding, when there are so many developers with whom I correspond but have never met!

Anyway, if you're looking for a replacement for 'configure', check out CMake. I really like it so far.

--titus

Syndicated 2007-01-22 10:03:06 from Titus Brown

17 Jan 2007 »

Testing Cartwheel, and How Testing Improves Code

Over the last few days, I put a bunch of time into my ongoing refit of Cartwheel.

Cartwheel is a sizeable toolkit for doing bioinformatic sequence analysis; it's got a Web interface, a batching and queuing system built on PostgreSQL, and a client library for manipulating things over XML-RPC. So, it's got a lot of moving parts, which means that testing it is important, but it's also hard.

(I wrote earlier about how automated testing is hard, even when you wrote all of the software involved, and I also wrote a bit about how to enable in-process testing of XML-RPC.)

Cartwheel is actually one of the main motivating factors behind twill, my framework for functional Web testing. I noticed that over the years, Cartwheel had gotten more and more brittle: as I'd add features, old ones would break or give me trouble. PBP kept popping up on my radar, and finally I took the time to go try it out & eventually rewrote it into twill.

So anyway, I've been adding to the functional tests for the Cartwheel Web and XML-RPC interfaces (what would that be, the HTTP-level stuff? yeah, that), and I ran into a whole slew of problems connected to releasing the database handles. It turns out that even though I had some finalization code for the database interface, it wasn't enough, and even the stuff that was there didn't work. Since the finalization code was never actually run in deployment -- Linux was actually doing the cleanup when the process died, of course -- I hadn't understood just how badly I'd written things. Database connections weren't getting closed, and this caused various operations to hang because I was trying to drop tables while connections were open.

Several hours of head-scratching, code examination, and 'print' statements later, I now have a large set of tests for both HTTP-level interfaces, and I'm much more confident that my cleanup routines are working. Moreover, I understand the consequences of my original architecture decisions much better.

This brings me to the meta-lesson: before this experience, I knew that good finalization routines were important because having them improved the usability of the code and also having them was simply the Right Way to program. Since I never actually had to re-use this code in other contexts, this was largely a theoretical consideration. However, once I had to do setup and teardown for my functional tests, which effectively meant I was reusing the code, I ran smack into the wall of reality: yes, it is important, and if I'd been smart enough to write things properly Back When my last two days would have been much more pleasant.

Next up: CMake, KWWidgets, and the Kitware experience.

--titus

Syndicated 2007-01-17 03:03:05 from Titus Brown

5 Jan 2007 »

This toy helicopter is *great*

I bought my wife a toy plane for Christmas, and we promptly broke it. While scouting for ways to fix it, I came across this helicopter which has to be the single coolest toy I've ever had. My brother and I are absolutely addicted to flying this thing; sadly, its charge only lasts a few minutes.

HIGHLY HIGHLY recommended, especially while it's $33 only ($100 off!).

--titus

Syndicated 2007-01-05 07:03:05 from Titus Brown

3 Jan 2007 »

Privacy Invasion is Good?

The Slashdot forum managers have approved yet another troll message: In Cameras Help Cops Catch a Killer, they ask whether or not the use of cameras to help catch a killer justifies the increasing use of surveillance in public spaces.

The answer is "no".

There are a couple of ways to think about this, and at least one of them points to a big flaw in the way we carry out public discourse.

First, let's consider the social implications. Suppose you say that surveillance is good, and we should install cameras everywhere, even in living rooms, bedrooms, and bathrooms. Wouldn't this prevent all crimes?! (No, it probably wouldn't; criminals adapt too, you know.) And, even if it did, privacy and solitude is a prized commodity: not something I want to yield on the off chance that it will solve a crime.

The more compelling argument IMO is to run the numbers. These arguments tend to focus on the "true positive" rate -- the number of crimes solved (sometimes spectacularly) by the use of the nifty tech -- but that's only 1 of four numbers you should consider. We also need to look at "true negatives" (how many innocent people are you surveilling?) "false negatives" (how many crimes are committed out of the reach of public surveillance?) and "false positives" (how many people are falsely accused based on misuse of the technology?) If most people don't commit crimes, then even if you catch all of the criminals, your false positive rate will dominate and you will have a lousy predictive ability. Likewise, if you have a high sensitivity to crimes committed in public, but most crimes are committed out of reach of public surveillance, you will have poor sensitivity and your false negatives will dominate.

This is where I think the flaw in our public discourse lies. When thinking about surveillance, SWAT teams, FAA regulations, DHS screening, etc., you have to ask yourself what signal you're looking for. If you're looking for one person in a million (a conservative estimate for passenger screening!) then you need a really sensitive filter if you're going to avoid hitting lots of false negatives, and you're going to have to have a really specific filter if you're going to avoid inconveniencing innocent people.

And, of course, there's always Bruce Schneier to point out that the Bad Guys adapt.

The only moderately intriguing (but also chilling) pro-surveillance argument that I have found is in David Brin's The Transparent Society, where he argues that the only way forward is to make the surveillance cameras publicly accessible. This would not only prevent crime but would also prevent police abuses of the technology.

A compelling counterargument to this can be found in Vernor Vinge's book A Deepness in the Sky. Vinge makes the essential point that in a transparent society, tyranny becomes an algorithmic challenge: he who extracts the most information from the surveillance technologies dominates. Not reassuring to me, thank you very much.

--titus

p.s. I am increasingly irritated at the adolescent phrasing of the posts that slashdot approves. One more reason to go readdit.

Syndicated 2007-01-03 19:03:31 from Titus Brown

2 Jan 2007 »

GalCon Strategy Guide

I just wrote a small GalCon strategy guide, available here. Comments welcome.

--titus

Syndicated 2007-01-02 02:03:10 from Titus Brown

1 Jan 2007 »

Happy New Year, everyone!

I hope everyone has (or had!) a good New Year's eve, and let's hope that 2007 is even better!

--titus

Syndicated 2007-01-01 03:03:07 from Titus Brown

228 older entries...