Older blog entries for nconway (starting at number 51)

21 Feb 2008 (updated 21 Feb 2008 at 08:38 UTC) »

Data Management for RDF

I was talking to a database researcher recently about why the artificial intelligence community and the database community haven't historically seen eye-to-eye. The researcher's opinion was that AI folks tend to regard databases as hopelessly limited in their expressive power, whereas DB folks tend to view AI data models as hopelessly difficult to implement efficiently. There is probably some truth to both views.

I was reminded of this when doing some reading about data management techniques for RDF (the proposed data model for the Semantic Web). Abadi et al.'s "Scalable Semantic Web Data Management Using Vertical Partitioning" is a nice paper from VLDB 2007, and appears to be one of a relatively small group of papers that approach the Semantic Web from a database systems perspective. The paper proposes a new model for storing RDF data, which essentially applies the column-store ideas from the C-Store and Vertica projects. Sam Madden and Daniel Abadi talk about their ideas more in a blog entry at The Database Column.

Planet PostgreSQL readers might be interested in this observation in the paper:

We chose Postgres as the row-store to experiment with because Beckmann et al. experimentally showed that it was by far more efficient dealing with sparse data than commercial database products. Postgres does not waste space storing NULL data: every tuple is preceded by a bit-string of cardinality equal to the number of attributes, with '1's at positions of the non-NULL values in the tuple. NULL data is thus not stored; this is unlike commercial products that waste space on NULL data. Beckmann et al. show that Postgres queries over sparse data operate about eight times faster than commercial systems

(A minor nitpick: Postgres will omit the per-tuple NULL bitmap when none of the attributes of a tuple are NULL, so it is not quite true that "every tuple is preceded by a bit-string".)

The cited Beckman et al. paper is "Extending RDBMSs To Support Sparse Datasets Using An Interpreted Attribute Storage Format".

It's interesting that none of the leading commercial systems seem to use exactly the same NULL bitmap approach that Postgres does. The tradeoff appears to be of storage against computation time: eliding the NULL values from the on-disk tuple reduces storage requirements, but makes it more expensive to find the offset within a tuple at which an attribute begins, if the attribute is preceded by one or more (elided) NULL values. If NULL values were stored in the on-disk tuple (and no variable-width attributes are used), the offset of an attribute can be found more efficiently.

In practice, Postgres implements another optimization that mitigates this problem to some extent: as tuples are passed around the executor and attributes are "extracted" from the on-disk tuple representation, they are effectively cached using the TupleTableSlot mechanism. This means that the computation to find the right offset for an attribute in the presence of NULLs is typically only done at most once per attribute of a tuple.

19 Feb 2008 »

Nice DBMS Internals Overview Paper

I noticed that Joe Hellerstein, Mike Stonebraker, and James Hamilton (DBMS luminaries all) have published a nice, reasonably high-level paper describing the architecture and design principles of a typical database management system: "Architecture of a Database System".

25 Jan 2008 »

PostgreSQL Mailing List Archives

MarkMail is now indexing all 630,000+ messages from the PostgreSQL mailing list archives. If, like me, you've been frustrated when trying to use the search engine and archives at archives.postgresql.org, I suggest checking out MarkMail. It's been working very well for me so far.

11 Jan 2008 »

Signed overflow in C

Ian Lance Taylor's blog has an interesting post on signed overflow behavior in C. According to the C standard, integer overflow results in undefined behavior, and modern versions of GCC take advantage of this to generate more efficient code. This topic was raised on -hackers by Tom a few years ago — at the time, only the -fwrapv flag was implemented by GCC. Now that GCC 4.2 provides -Wstrict-overflow, this might be worth investigating further.

The broader point here is that while this optimization is completely legal according to the C standard, it is inconsistent with the traditional C semantics, and runs the risk of breaking code that depends on integer overflow having the expected behavior. At least GCC now provides a flag to emit warnings for potentially broken code, which IMHO is a prerequisite for doing aggressive optimizations of this type. There's another interesting post on Ian Lance Taylor's blog that discusses this situation in general (e.g. alias optimizations are another instance where the C standard contradicts the traditional expectations of C programmers).

6 Dec 2007 »

Filesystem Replication and Software Quality

A few years ago, I did a summer internship with a group at Microsoft that was building a multimaster filesystem replication product. This was a very rewarding experience for several reasons. Now that the replication product has been shipped (in Windows 2003-R2, Vista, and Windows Live Messenger), I was happy to see that my mentor for that summer, Nikolaj Bjørner, has published a paper containing "lessons learned" from the project: "Models and Software Model Checking of a Distributed File Replication System". The paper is worth reading, for a few reasons:

Why is filesystem replication such a hard problem, particularly in the asynchronous, multi-master case?
The paper talks about the basic problem and the approach the group took to solving it.
Perhaps more interestingly, how do you go about constructing a high-quality implementation of such a product?
I was impressed by the group's emphasis on correctness. Nikolaj and Dan (the technical lead for the group) both had a CS theory background, so this is perhaps not surprising -- but it's interesting to see some of the practical techniques that they used to ensure they built a correct replication system:
- A detailed specification (on the order of a few hundred pages)
- A prototype of the system in OCaml, written concurrently with the specification but before the real implementation work began
- A high-level, executable specification of the replication protocol in AsmL. This served as both a readable description of the protocol, as well as a way to automatically generate useful test cases.
- Using model checking to verify the correctness of certain particularly complex aspects of the protocol (distributed garbage collection, conflict resolution).
- A "simulator" that walked a random tree of filesystem operations, pausing after each node to verify that the system had correctly replicated the resulting filesystem state. Once a leaf node in the tree was reached, the simulator then backtracked, exploring another branch of the tree. The simulator was also clearly inspired by model checking techniques. By replacing certain components of the real system with virtualized ones (e.g. using a toy in-memory database), this tool could be used to test large numbers of scenarios very quickly.
- Exhaustive testing. Using the simulator and a cluster of test machines, more than 500 billion test cases were examined.

16 Nov 2007 »

Jim Gray Tribute

On May 31, 2008, a tribute to honor the life and work of Jim Gray will be held at UC Berkeley. There's a technical session, for which registration is required, preceded by a general session that is open to the public. As the invitation email I received (thanks Elein!) states:

This is not a memorial, because Jim is still listed as missing, and will be so listed until about Jan 28, 2011. It is important that it is not referred to as a memorial, because it can't be a memorial until then. We believe that it is good to go ahead and recognize Jim's contributions, to honor him in a Tribute, before such a long time has passed.

9 Nov 2007 (updated 9 Nov 2007 at 07:42 UTC) »

Stonebraker on Databases for "Big Science"

There's a new post by Stonebraker up at The Database Column. I don't have much to add to the post itself, although it's interesting to hear some information about the old Sequoia 2000 project. I notice that Google and Yahoo were invited to the workshop — at first glance, it seems to me that the data management problems faced by the big web companies are quite dissimilar to the challenges facing "big science", but perhaps that's not the case.

22 Oct 2007 (updated 22 Oct 2007 at 20:28 UTC) »

PostgreSQL Conference Fall 2007

This weekend's conference in Portland was a great experience. Much thanks to Selena Deckelmann, Josh Drake, and all the other volunteers for organizing and running the conference. Everything ran amazingly smoothly!

I've posted the slides to my talk on "Query Execution Techniques in PostgreSQL". I thought the talk went fairly well, although unfortunately I didn't have enough time to get to everything I wanted to discuss.

In the talk, one of the algorithms I discussed was the "hybrid hash join", which is the common hash join algorithm used by most modern DBMSs, including PostgreSQL. The night before, Jeff Davis tipped me off to the fact that the inventor of the hybrid hash join algorithm, Dr. Len Shapiro from PSU, was going to be in the audience! Thankfully I didn't get the details of the hybrid hash join wrong :) It was a pleasure to meet Dr. Shapiro, whose students are doing some interesting work improving hash index bulk build performance.

29 Jun 2007 »

Note To Self

When running Postgres in EXEC_BACKEND mode on Linux, make sure to do:

echo 0 > /proc/sys/kernel/randomize_va_space

Preferably, before spending a while wondering about the non-deterministic process startup errors that will otherwise occur.

22 May 2007 »

PgCon

I gave a revised version of the "Introduction to Hacking PostgreSQL" tutorial at PgCon earlier today. I've posted the slides, handouts, and example patch here; this version uses a completely new example patch, and much of the introductory material has been revised. You can also find the slides at the PgCon page for the talk, which also includes a link to give me feedback.

42 older entries...