18 Apr 2011 cbbrowne   » (Master)

NoSQL’s next step – stored procedures

The latest discovery is that the “bad old stored procedures” of SQL… Are what NoSQL needs… http://highscalability.com/blog/2010/11/1/hot-trend-move-behavior-to-data-for-a-new-interactive-applic.html

They’re calling them coprocessors or plugins, and it’s truly not terribly surprising. The High Scalability article makes a Battlestar Galactica joke, of http://en.wikipedia.org/wiki/Eternal\_return. The BSG line that kept coming back over and over was: All this has happened before, and all this will happen again. There’s a rather depressing possibility that people will consider coprocessors to be the greatest thing ever, not realizing that a substantial chunk of the same issues true (for better and worse) for SQL stored procedures will also hold true for coprocessors and they may learn (or fail to learn!) from scratch.

The notion is that you colocate, along with your database, some kind of “coprocessor engine” that can run code locally, which solves a number of problems, some not new, but some somewhat unique to key/value stores:

Connectivity

You’re running your application in the cloud and have somewhat spotty connectivity between the place where your application logic runs and the database where the data is stored. A coprocessor brings logic right near the database, resolving this problem.

Bulk data transfer

A difference between SQL and key/value stores is that SQL is quite happy shovelling sets of data back and forth, whereas key/value stores are all about singular key/value pairs. An SQL request readily “scales” by transferring data in bulk, whereas key/value can get bogged down by there being a zillion network round trips. A coprocessor can keep a bunch of those “round trips” inside the database layer, which will be a big win.

Goodbye, foreign keys, hello, um, ???

You may be able to shove some combination of logic maintenance and such into the coprocessor area, thereby gaining back some of the things lost when NoSQL eschewed SQL foreign key references and triggers.

Data normalization analysis returns

One of the typical things to do with NoSQL is to “shard” the database so each database server only has part of the data, and may operate independently of other database servers.

Coprocessor use will require that all the data that is to be used is on the local server, otherwise you head back to the problem of shovelling tuples back and forth between DB servers with the zillions of network roundtrips problem.

To guard against that, the data needs to be normalized in such a way that the data relevant to the coprocessors is available locally. (Perhaps not exclusively, but generally so. A few round trips may be OK, but not zillions.)

It seems to me that people have been excited by NoSQL in part because they could get away from all that irritating SQL normalization rules stuff. But this bit implies that this benefit was something of a mirage. Perhaps the precise rules of Boyce-Codd Normal Form are no longer crucial, but you’ll still need to have some kind of calculus to ascertain which divisions work and which don’t.

Things still not clear about this…

Managing the coprocessors

One of the challenges faced in SQL systems that use a lot of stored procedures is that of managing these procedures, complete with versioning (because what goes into production on day #1 isn’t what will be there forever, right?).

Windows always used to suffer (may still suffer, for all I know) from dependency hell, where different applications may need competing versions of libraries. (Entertainment of the week was seeing that the Haskell folks http://www.haskell.org/pipermail/haskell-cafe/2010-April/076164.html are, of late running into this.  Not intended as insult; it’s a problem that is nontrivial to avoid.)

It’s surely needful to have some kind of coprocessor dictionary to keep this sort of thing under some control. It’s never been trivial for any system, so there’s room for:

* Repeating yesteryear’s errors

* Learning from other systems’ mistakes

* Discovering brand new kinds of mistakes

How rich should the coprocessor environment be?

On the powerful side, http://nodejs.org surely is neat, but having the ability to run arbitrary code there is risky…

How auditable will these systems be?

On the positive side, it’s presumably plausible to add auditing coprocessors to capture interesting information for regulatory purposes.

On the other hand, arbitrarily powerful things like node.js might make it arbitrarily easy to evade regulation.

There aren’t necessarily easy answers to that.

Aside: org2blog mode is pretty nifty…  Made it pretty easy to build this without much tagging effort…

Syndicated 2011-01-27 22:40:00 from linuxdatabases.info

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!