A brief followup to my earlier post on random generation of SQL queries for testing purposes: Don Slutz from MSR has an interesting paper called "Massive Stochastic Testing of SQL" that describes a stochastic SQL query generator called RAGS. One of the problems with randomized testing that I noted earlier is that it is difficult to distinguish between successes and failures: if the backend crashes while executing a query it is very likely a defect of some kind, but if no crash occurs, how do we know that the backend has produced the correct result set? Slutz writes:
If a SQL Select executes without errors, there is no easy method to validate the returned values by observing only the values, the query, and the database state. Our approach is to simply execute the same query on multiple vendors' DBMSs and then compare the results.
As the author notes, this isn't perfect:
The use of database systems from different vendors for output validation proved to be extremely useful for the SQL common to many vendors. The down side is that the common SQL subset is relatively small and changes with each release. We also found the differences in NULL and character string handling and numeric type coercion in expressions to be particularly problematic (these are also portability issues).
The paper's approach is to randomly generate SQL queries using an SQL grammar. The grammar defines the set of legal SQL statements, so it can be used to generate members from that set. For example, a trivial SELECT grammar might be:
SelectStmt: SELECT t_list FROM relation
The SQL generator would generate values for the "t_list" and "relation" randomly: it would generate a randomly-sized list of random target list expressions for "t_list", and a random relation name for "relation". The paper's SQL generator allows for a few simple configuration settings: generate between n and m joins, generate between x and y elements in the target list, and so on. William McKeeman's "Differential Testing for Software" explores some similar ideas for doing stochastic testing of C compilers. Both papers have been discussed by other bloggers ([1], [2]) in the past.
One problem with this approach is that it often generates unrealistic queries. Using an SQL grammar ensures the randomly-generated queries will be well-formed, but it doesn't provide any guarantees about whether the queries will be sensible, or bear any relation to the sorts of queries users are actually running. The relatively coarse-grained tuning knobs provided by RAGS also don't allow test suites to be written to validate the behavior of the database for specific classes of queries: you can't generate a specialized set of SQL queries to test a particular part of the query optimizer, for example. It would be nice to be able to take the query workload from a production database, and write a high-level constraint that captures the notion of "queries that look like that." I'm still hopeful a DSL could provide a nice solution to both problems, but I haven't had a chance to really think about it yet.