Denormalizing graph-shaped data
As nicely pointed out by Ilya Katsov:
Denormalization can be defined as the copying of the same data into multiple documents or tables in order to simplify/optimize query processing or to fit the user’s data into a particular data model.
So, I was wondering, why is – in Ilya’s write-up – denormalization not considered to be applicable for GraphDBs?
I suppose the main reason is that the relationships (or links as we use to call them in the Linked Data world) are typically not resolved or dereferenced, which means traversing the graph is fast, but for a number of operations such as range queries, denormalized data would be better.
Now, the question is: can we achieve this in GraphDBs, incl. RDF stores? I would hope so. Here are some design ideas:
- Up-front: when inserting new data items (nodes), immediately dereference the links (embedded links).
- Query-time: apply database cracking.
Here is the question for you, dear reader: are you aware of people doing this? My google skills have failed me so far – happy to learn about it in greater detail!
Filed under: Big Data, Cloud Computing, Idea, Linked Data, NoSQL
