29 May 2011 mhausenblas   » (Journeyer)

Solving tomorrow’s problems with yesterday’s tools

Q: What is the difference between efficiency and effectiveness?
A: 42.

Why? Well, as we all know, 42 is the answer to the ultimate question of life, the universe, and everything. But did you know that in 2012 it will be 42 years that Codd introduced ‘A Relational Model of Data for Large Shared Data Banks‘?

OK, now a more serious attempt to answer above question:

Efficiency is doing things right, effectiveness is doing the right thing.

This gem of wisdom has originally been coined by the marvelous Peter Drucker (in his book The Effective Executive – read it, worth every page) and nicely explains, IMO, what is going on: relational database systems are efficient. They are well suited for a certain type of problem: dealing with clearly-defined data in a rather static way. Are they effectively helping us to deal with big, messy data? I doubt so.

How comes?

Pat Helland’s recent ACM Queue article If You Have Too Much Data, then “Good Enough” Is Good Enough offers us some very digestible and enlightening insights why SQL struggles with big data:

We can no longer pretend to live in a clean world. SQL and its Data Definition Language (DDL) assume a crisp and clear definition of the data, but that is a subset of the business examples we see in the world around us. It’s OK if we have lossy answers—that’s frequently what business needs.

… and also …

All data on the Internet is from the “past.” By the time you see it, the truthful state of any changing values may be different. [...] In loosely coupled systems, each system has a “now” inside and a “past” arriving in messages.

… and on he goes …

I observed that data that is locked (and inside a database) is seminally different from data that is unlocked. Unlocked data comes in clumps that have identity and versioning. When data is contained inside a database, it may be normalized and subjected to DDL schema transformations. When data is unlocked, it must be immutable (or have immutable versions).

These were just some quotes from Pat’s awesome paper. I really encourage you to read it yourself and discover maybe even more insights.

Coming back to the initial question: I think NoSQL is effective for big, messy data. It has yet to proof that it is efficient in terms of usability, optimization, etc. – due to the large number of competing solutions, the respective communities are smaller and more fragmented in NoSQLand, but I guess it will undergo a consolidation process in the next couple of years.

Summing up: let’s not try to solve tomorrow’s problems with yesterday’s tools.

Filed under: Big Data, Cloud Computing, FYI, NoSQL

Syndicated 2011-05-29 06:07:56 from Web of Data

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!