Ruby ActiveRecord and lazy instantiation
Let's say I have a data set in a SQLite database. And, let's say I need to iterate over every row in that database, performing some action based on it. There's a few ways I can do this. First, the old, direct-to-driver version:
require 'sqlite3' db = SQLite3::Database.new("test.db") db.execute("select * from test_entries") do |row| ... endSQLite example
That works, but of course, it's bound forever to SQLite, and we'd like to abstract that a bit. The "new hotness" appears to be ActiveRecord. So, let's reproduce that using AR:
require 'active_record' class TestEntry < ActiveRecord::Base; end ActiveRecord::Base.establish_connection({ :adapter => 'sqlite3', :dbfile => 'test.db' }) TestEntry.find(:all) do |entry| ... endActiveRecord example
Here's a tip: don't do that. That find call actually instantiates an ActiveRecord::Base for every single row in the database, then returns the result set, which you're now iterating over. So, for anything but a trivially-small result set, you're going to use a ton of memory, and create an enormous workload for very little good reason.
So, I suppose this blog entry is really just a whine: why isn't there a lazy-instantiation iterator for result sets in ActiveRecord? In this (common, I'd think) case where you want to do some sort of trivial filter or action across the entire table on a per-row basis, but don't care about the row after you've worked with it, that would be a huge win. The interesting thing is that this is exactly how the auto-generated association methods work, and you actually have to go out of your way to do eager loading of associations (since just automatically doing joins under the hood is an obviously bad idea).
My specific use case is an application log stored in a table that I'd like to iterate over for some basic statistical analysis, and a single day of logs is a 600M SQLite database (approximately 3.6M rows). This won't fly for doing a single day of processing, and I'd like to be able to do something for monthly and quarterly reporting.
I'll probably resort to DBI
to accomplish this in a database-agnostic manner, but that feels like an
incredible step backwards. I just can't understand why the find*
methods weren't written as iterators that do lazy instantiation of the
result set; the interface would be effectively the same, but the potential
performance gains could be tremendous in certain cases.