Doing testing and need to decrypt captured SSL traffic?
full moon names
Storing Research Data -- SQLite vs. The Other Guys
StoringResearchData Created Monday 27 February 2012
I generally have a real aversion to using a full-on SQL implementation like MySQL or Postgres when it's not really necessary... and I typically think it's unnecessary until something forces me to change my mind. When I started writing small projects in Perl, I used to use Berkeley DB key-value pairs — lately I've used Python Pickles for similar purposes. It's simple and quick, which is nice, but it basically forces you to roll your own code.
Lately, everyone's been getting on the SQLite bandwagon, and it's pretty awesome. I've moved to using SQLite as my first choice when storing anything beyond flat text data. It has nice portability characteristics (unlike your homemade solution), simple backup and export formats. And, being able to make queries on the data is great for me, since these days most of my data are experimental results, etc.
But another good reason for using an external, portable data sink became obvious when I started to visualize and analyze my data using R. R has the RSQLite package, which made importing data into R for plotting and analysis a breeze. (Or as much of a breeze as anything is in R.) The thing is, roll-your-own formats may be perfectly "good enough" for isolated projects, but the minute you start wanting to use a different tool to view the data, having it already be in a format that is easily accessible is a major win. And if you're like me, you won't always know that in two months you're going to want to use the data in some completely different way. So I feel like I got that capability for "free" just because I decided to store my data in a more structured and portable way.
But SQLite is targeted at a specific niche — projects that would benefit from SQL behaviors but don't need all the robustness and consistency guarantees of "enterprise" databases. While you get good performance (because SQLite works inside your code, rather than through RPCs), if you start wishing for read/write concurrency (say, importing new data and plotting other unrelated data at the same time) you may find yourself frustrated with SQLite's limitations. That's what happened to me. As I started to generate large numbers of plots, I wanted to be able to run multiple scripts (some of which generate plots, some of which update other tables) at the same time — SQLite can balk at this.
So, I switched to a more traditional SQL database, which itself was relatively painless because of the underlying SQL standard. (R has libraries for it as well — another good reason not to roll your own even if you don't need all that SQL provides.) And in turn, this highlighted another unexpected value for the more "enterprise" systems: caching. Re-running long queries just to tweak a plot is considerably quicker with the big server as opposed to SQLite.
None of this is new information — in fact, it's a pretty textbook tour of the hows and whys of data storage. But for me, it inspired a change of heart, thanks to the convenience factor. In the future, I'll probably start with a Big Dog SQL server for research projects (running on my personal laptop) because it avoids the papercuts encountered when my projects get to big for SQLite. But I'll stick to SQLite for simpler things, to avoid the dependencies created by the more enterprise approach.
The Process #1 -- The Symbolism Survey
Lately I've been really interested in the process that a craftsman goes through, intentionally or unintentionally, in the process of creation. I use the word craftsman because I'm interested not just in "art" per se, but also (not coincidentally) in things like academic writing, problem solving and engineering. This is a fascinating treasure trove, not just for the answers provided, but for who provided them...
In the Boy Scouts, there is a thing called a "Totin' Chip". It is "both an award and contract in Boy Scouts of America that shows Scouts understand and agree to certain principles of using different tools with blades" (WP). To get the Totin' Chip, which is a paper card (like a library card or the like) scouts must demonstrate a certain amount of knowledge and responsibility. The Wikipedia page has more on it, of course. The main thing (besides the rules) is that violations of the Totin' Chip code result in one or more corners of the card being removed; when all the corners are gone, you lose your right to tote a blade.
Anyway, I think there should be a "Codin' Chip" -- maybe it's a card, maybe it's an actual chip. If it's a card you lose corners; if it's a chip, you lose pins. Anyway, when you lose em' all, you're done.
Violations can be large or small; for example, not commenting code meant for others to read falls into that category, as does using equality to test floating point numbers inappropriately. Using strcpy and the like is definitely in there.
What else should cause you to forfeit a pin off your Codin' Chip?
Changing references from parens to brackets...
In Word 2007 (at least) in order to change references from using parentheses to using square brackets requires editing XML! Boooo.
Almost as soon as we moved into this apartment in Pacific Palisades, CA, I put up a hummingbird feeder. There are tons of them in this neighborhood, and I've always enjoyed them. Then, A couple years ago, I took cuttings from the ivy growing in our hedge and planted it on our porch. It grew pretty well and soon started its way across the ceiling of our little porch cubicle.
Well, a few weeks ago, I noticed something weird on a branch but didn't pay any attention to it. Until I noticed that a hummingbird was very often on the branch -- and zoomed away every time I opened the door. Then, one day I was working on the porch, watering the plants, and the hummer zoomed up with a mouth full of fluff and spiderwebs and it all clicked -- the bird was building a nest in our ivy.
Now, what else is there for a computer nerd to do than to webcast the whole thing?
Vital statistics: I think it's a female Allen's Hummingbird, but I'm not exactly sure. Yes, that is an egg in the nest. There is one small egg, about the size of a large jelly bean. I've read online that they usually lay two eggs, but there's definitely just one in the nest. The nest itself is probably 6cm long, about an inch and a half. The bird was working on the nest less than a week ago but the egg is visible in the earliest pictures I took with the webcam on 5/4.
The weirdest thing is that the bird is only on the nest maybe 20 minutes out of the hour, or less, and it's not (usually) because of interruptions from us. I don't know if that means the egg really isn't going to hatch, or if that's acceptable in a warmer climate, or what. I'm hoping that if it doesn't hatch, she might "double brood" -- have another round. There are h-birds year round here so that's not impossible.
Images are updated between 6AM and 8PM every night, although I may shrink that some, as you can see, the cameras don't handle the low light very well. I will probably document my setup in a few days for those out there that are interested, but in short, it's basic webcams being driven by a Linksys NSLU2 running Debian Linux.
Anyway, I hope you enjoy!
If I ever open a Beatles-themed day-old bread store, it will be called "Yeasterday"! Bada-boom!
Unintentionally funny DVR episode synopses #243
"Property Virgins. 'A Woman Who Survived a Lightning Strike Is Shocked by Real Estate Prices' Home prices shock a lightning-strike survivor."
Hardware is just software you can't change.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!