A Database Design

Posted 26 Feb 2001 at 21:30 UTC by apm Share This

This article describes the design of the database management system (DBMS) in the Suneido integrated application platform. As the designer, I am looking for comments, criticisms, comparisons, and suggestions.   I am also looking for feedback on this article.   Does it adequately describe the database?   Is it understandable?   Is there anything that could/should be taken out?  Is there anything that should be added?

Suneido has an integrated client-server relational database. The database is accessed via a language which includes administration requests and update operations as well as actual queries. The query language is based on the relational algebra language suggested by C.J.Date.

The database can store the following types of values:

  • booleans (true or false)
  • strings (including binary e.g. image data)
  • number
  • dates
  • Suneido objects (i.e. arrays or records)

Suneido's DBMS, like its language, is dynamically typed i.e. database columns (fields) do not have fixed types - they can hold any type of value. Validating data is left up to the application. All fields and records are variable length.

Suneido stores the entire database as a single operating system file. This includes the schema (table layout) information, data records, indexes, and concurrency & recovery information. The database is accessed as a memory mapped file.

The database can operate in single-user local mode, or in multi-user client-server mode. TCP/IP is used to communicate between the clients and the server.

System Tables

Information about the database (the schema) is stored in the following system tables:

tables - lists the tables that exist

columns - lists the columns for each table

indexes - lists the keys and indexes for each table

triggers - lists the summarize triggers for each table

views - lists the view definitions

These tables may only be altered by the system. However, they can be read from just like any other table.

Administration

The following administrative requests allow you to create, alter, rename, and destroy database tables.

create

Every table must specify at least one key - one or more columns that uniquely identify records. You are not required to identify a "primary key". You cannot add a record with a key that already exists in the table.

create customers (name, phone, fax, email) key(name)

If a table should only have at most one record you can specify an empty key:

create configuration (option1, option2, option3) key()

Tables can also have "indexes". These are not part of the "logical" design of the database; they serve simply to speed up queries. Suneido will automatically use indexes to execute queries faster. For example, if you often searched for customers by phone number:

create customers (name, phone, fax, email) key(name) index(phone)

ensure

Ensure has the same syntax as create, with "create" replaced by "ensure". If the table doesn't exist, it is the same as create. Otherwise. it compares the specification with the existing table and adds any columns, keys, indexes, or summarizes that don't exist. It does not remove any extra columns, keys, indexes, or summarizes. In other words, it ensures that the table has at least the specified schema. For example:

ensure customers (name, phone, fax, email) key(name)

alter

Alter can be used to add or remove columns, keys, indexes, or summarizes, with a syntax similar to create and ensure.

alter customers create (city) index(city)
alter customers delete (fax)

rename

Rename tables, for example:

rename calls to phone_calls

view

Define a view, for example:

view local_customers = customers where city = 'Saskatoon'

or:

view calls = phone_calls union fax_calls

Views permit an application's "logical" view of the database to differ from the "physical" design. For example, the application can refer to a view as if it is a simple table, regardless of whether it is actually implemented as a single table or a union, or a where.

Currently only some views are update-able, depending on their query. Update-able queries include WHERE, PROJECT that includes a key, JOIN that is one to one, RENAME, and EXTEND. Theoretically, according to C.J.Date, all queries are update-able.

destroy

Destroy a table. For example:

destroy customers

Destroy can also be used to un-define a view. This does not affect the actual data.

Foreign Keys

A foreign key consists of one or more columns in one table that uniquely identify a row in another table, i.e. are a key in the other table. Foreign keys place constraints both on the source table - foreign key fields must contain values that exist in the other table, and on the target table - you can't delete or update records if doing so would make source records invalid.

create calls (date, customer, comments, minutes) key(date)
    index(customer) in customers(name)

You can also specify cascading deletes and updates. For example, if you delete an invoice header record, you can have the invoice line records automatically deleted. Or if you update the invoice number on the header, you can have the invoice line items automatically updated. For example:

create invoice_lines ...
    index (invoice_num) in invoices cascade

summarize

Summarize creates and maintains stored summaries of a table.

For example:

alter calls create summarize(customer, count, total minutes)

would create a table called _summarize_calls_customer which would store the count of calls and total minutes of calls for each customer. This table would be automatically updated as records are added, updated, or deleted from the calls table.

See also: Queries - summarize.

Queries

where (select/restrict)

Produces only those records that match certain criteria.

For example:

customers where city = "Saskatoon"

would produce only the customer records where the city was Saskatoon.

Where expressions are a subset of the expressions allowed in the Suneido language, including arithmetic and string operations, and also allow calling user defined functions.

project

Extract specified columns and eliminates any resulting duplicates.

For example:

customers project city

would produce a list of cities where you have customers.

rename

Rename columns - useful for join, leftjoin, union, minus, and intersect.

For example:

(sales rename salesman to employee) join staff

extend

Add calculated columns. For example:

sales extend amount = quantity * price

join (natural eq-join)

The result of join is a table with all the columns from the input queries (without duplicates) and with the set of rows formed by combining each pair of rows with equal common columns. The input queries must have at least one column in common.

For example:

customers join sales

Join excludes rows from the first table that do not have a matching row in the second table.

leftjoin (left outer natural eq-join)

Similar to join except leftjoin includes rows from the first table that do not have a matching row in the second table. These rows will have empty ("") values for the columns of the second table.

For example:

customers leftjoin sales

will include all customers, whether they had sales or not.

times (product)

The result of times is a table with all the columns from both queries and with the set of rows formed by combining each possible pair of rows from the queries.

For example, to get all possible combinations of city and items:

city times items

summarize

Summarize a table by specified columns, with added columns for accumulated functions (count, total, max, min, average).

For example, to get the total sales for each city:

summarize sales city, total price

See also: Administration - summarize

union
minus (difference)
intersect

These are the standard set operations. Union eliminates any duplicates.

These operations require that their inputs have the same set of columns - project, extend, and rename are often useful to achieve this.

For example, to get a combined list of cities for your customers and suppliers:

(customers project city) union (suppliers project city)

Note: Although many of the example above show the query operations being applied to tables, the real power of the query language comes from being able to apply operations to any other query, not just tables.

Query Optimization

Query optimization has two main phases. The first phase applies some standard tranformations to the query that are almost always advantageous. For example, moving where's towards tables and combining adjacent operations. In the second phase, operations choose appropriate strategies, indexes, and temporary indexes based on estimated costs. Data sizes are estimated using the indexes. Some operations have multiple strategies (e.g. project) they can use, other operations (e.g. rename) have only a single strategy.

Updating

Database update requests can be used to modify the database.

For example, to insert a record into a table:

insert { name: 'Fred', salary: 37500 } into employees

or to insert records from a query into a table:

insert sales where city = "Fargo" into fargo_sales

or to update all the records from a query:

update parts where category = 2 set price = price * 1.1

or to delete all the records from a query:

delete sales where price < .10

Rules

Unlike many systems, which limit business rules to constraints, Suneido's business rules support a variety of uses including supplying default values to fields, performing calculations, and summarizing other data. Business rules have many advantages. They keep your business logic separate from your user interface and reports, enable code re-use, and allow your code to be written in smaller modules that are easier to test and maintain.

You can define rules for fields by defining functions called Rule_fieldname. When you access a field that the record does not contain, if there is a rule it will be called. If the rule returns a value, it will be stored in that field of the record. When rules are executed, Suneido automatically tracks their dependencies on other fields they access. If a dependency is changed, then the rule field is invalidated. This means that the next time the field is accessed, the rule will be executed again. Dependencies can be stored in the database (by creating a field called fieldname_deps) so that when old records are manipulated, rules will be triggered just as on new records. Invalidations also trigger record.Observer - this is used to update the user interface when records change. Invalidations do not affect non- rule values. i.e. if the user has overridden a derived value, then the rule on that field will no longer be triggered. Rules can be used without actually storing the values, or calculated columns can be stored in the database. Rules can also be used to adjust user interface controls.

Triggers

Whenever records are output, updated, or deleted from a table, the system checks for a user defined trigger named "Trigger_" followed by the table name. Triggers are only called after the operation succeeds. i.e. If the output, update, or delete fails, the trigger will not be called. Triggers can be used to maintain secondary tables such as summaries.

History

Creating a table automatically creates a coresponding table with "_history_" prefixed on the table name. History tables are read-only except for the ability to add indexes. Any records removed from a table by deleting or updating will automatically appear in the history table. The history table has an additional "_deleted" field that is the timestamp when that version of the record was deleted (or updated).

Note: Since Suneido needs old versions of records for concurrency and recovery, this history facility is virtually "free".

Concurrency

Suneido's DBMS can operate in one of two modes - single-user local mode, or multi-user client-server mode. In either case, the database file itself is only ever accessed by a single program exclusively, so no operating system file locking is required.

All access to the database must be done within transactions. Transactions can be either read-only or update. Transactions see the database as of their start time, as if they were viewing a "snapshot" of the database. Suneido uses a multi-version, optimistic, timestamp-based approach to concurrency, which provides full transaction isolation, i.e. is serializable. Because of this, read-only transactions (e.g. reports) always succeed - they will never conflict with other transactions. Update transactions check for conflicts with other transactions when they complete, and fail (rollback) if conflicts are found.

On-line backups are done using a single read-only transaction to get a "snap-shot" of the entire database without interfering with use of the database.

Recovery

Suneido uses the database itself as its "log" i.e. to keep track of updates, both to support concurrency (e.g. multi-version and rollback) and to support recovery. This is possible because data records are only ever added at the end of the database file, never updated in place. This means that recovery (e.g. from a crash) is largely a matter of determining what portion of the end of the file must be discarded in order to rollback to a consistent state. Since indexes are updated in-place for efficiency, they are ignored by the recovery process and rebuilt from the data. Checksums are used to determine the integrity of the database.

References

An Introduction to Database Systems, C.J.Date


Bindings for other languages?, posted 26 Feb 2001 at 23:08 UTC by Bram » (Master)

Do you have, or are you planning to implement, access to these databases from other languages?

re. Bindings for other languages, posted 26 Feb 2001 at 23:30 UTC by apm » (Journeyer)

This isn't a big priority for us since we're focusing on the advantages of the integrated language and database. But it's certainly do-able. One approach would be to simply access the server over TCP/IP using the same protocol that Suneido clients use. The protocol is pretty straightforward and is mostly text (e.g. you can exercise it with telnet) although full use requires binary transfers. Or you could make use of the client code in Suneido to manage the client-server communications. Or if you just needed single-user local mode, you could simply use the database code - either linking it into your application or turning it into a DLL or a COM component.

A few questions, posted 26 Feb 2001 at 23:35 UTC by AlanShutko » (Journeyer)

  • Why are you writing your own DBMS instead of using one of the existing ones?
  • Why are you writing your own query language instead of using SQL?
  • Why are you getting rid of staticly-typed fields? It seems to me that they're a big win: instead of shoving that knowledge to the application, you keep it in one place, and make sure a faulty application can't mess things up.

re. A few questions, posted 27 Feb 2001 at 00:15 UTC by apm » (Journeyer)

Why another database?

We didn't feel we could get the language integration, small size, simplicity, and full concurrency and recovery with a third party product. The entire Suneido executable including the database server, database client, language compiler, and language runtime is only 1.7mb. It requires NO installation or setup - command line arguments determine whether it runs standalone, as client, or as server. And it's all open source. And any other database would've probably used SQL ...

Why not SQL?

"... it cannot be denied that SQL in its present form leaves rather a lot to be desired ... the language is filled with numerous restrictions, ad hoc constructs, and annoying special rules. These factors in turn make the language hard to define, describe, teach, learn, remember, apply, and implement." C.J.Date, author of An Introduction to Database Systems

Why dynamic types?

For the same reasons that some languages (e.g. Smalltalk, Python, Ruby) are dynamically typed - ease of use, flexibility, rapid development. Static typing does prevent some errors e.g. putting a string into a number field, but you still need additional constraints e.g. the valid range of numbers, so why not do all the checking with the constraints? And a dynamically typed database fits well with a dynamically typed language. It wouldn't be hard to add static types to the database.

Andrew McKinlay
Suneido Software

structured data in cells, posted 27 Feb 2001 at 08:46 UTC by Ankh » (Master)

Have you thought about storing XML?

E.g., consider:

<step n="3">insert part <pn>2001</pn> into your left ear.</step>

I might want to say, find me all Procedure elements containing a step mentioning part 2001.

Just a thought - this is something SQL-based RDMS systems are generally very bad at, because SQL doesn't have containment as a native concept: cells don't contain other cells.

re. structured data in cells, posted 27 Feb 2001 at 14:47 UTC by apm » (Journeyer)

I've thought about how XML could fit in. One issue is that XML is tree structured whereas a relational database is "flat" tables.

You could certainly put XML strings into fields and define some functions to search on them. For example, if you had a table called "instructions" with a field called "stepinfo" that contained XML:

instructions where XML_contains(stepinfo, 2001)

But you couldn't index XML fields properly (just as one big string) so you couldn't efficiently search large tables.

There are other ways of storing tree structured data in relational databases, but that's not necessarily related to XML.

I can see using XML as an import/export format.

Andrew McKinlay
Suneido Software

NIH?, posted 27 Feb 2001 at 17:37 UTC by goran » (Journeyer)

Not being the one to rain on the parade, BUT
many of us has seen glorious projects take off and fail.

I should recommend using components that are out there

  • Language: perl, python, java, whatever
  • Database: PostgreSQL, MySQL or SQLite
  • Transport: CORBA, XML-RPC
Certainly not perfect but good enough to let the designers and coders concentrate on functionaly instead of build a new alternative infrastructure.

It is important to deliver something before a project looses steam. The most ambitious projects often fails to deliver anything, while quick'n'dirty hacker-fests often deliver unstructured but useful functionality.

Careful design make it fast to code, overambious design takes forever and selldom get implemented at all

Good luck,
Göran

re. NIH?, posted 27 Feb 2001 at 19:48 UTC by apm » (Journeyer)

Sure there's always a certain element of "Not Invented Here" in any project. And a certain "arrogance" in thinking you can do it "better". And certainly a lot of projects fail. But at the same time, *someone* has to (try to) create new products - at least until the "perfect" system is available (and open source).

One point that maybe wasn't clear in my article is that all of this is implemented and working, it's not just an "idea". We have, "delivered something" and we haven't "run out of steam".

The whole idea behind Suneido is to eliminate the hassles of trying to integrate language A and database B and transport C and gui D ... so that, as you say, designers and coders can concentrate on functionality.

Andrew McKinlay
Suneido Software

NIH, and a few more questions, posted 1 Mar 2001 at 06:45 UTC by tony » (Journeyer)

Individual developers have strong preferences-- not just *some* developers, but *all* developers. Me, I can't stand Python just because it uses indentation for scoping. This is too much like that ANSI MUMPS language for my liking. Kinda arbitrary, but it's a strong preference. So, as has been hinted at here, tieing your database to a specific language may keep a *lot* of programmers away.

Now some questions-- how long is history kept? Is there a method of purging old data? Or does the database require a complete history from time of conception?

Keeping transaction logs in the database itself is a Bad Idea, IMNSHO. The concept of a transaction log is for database recovery-- hard to do if the transaction logs are in the corrupt database. Also, as the transaction logs fill up all available space, there's no way to manage them if they are trapped in the database container files.

I am glad to see a database query language built on Mr. Date's suggested algebraic QL. It'll be nice to see how it stacks up against SQL, which has proven usefull, crufty as it is.

re. NIH, and a few more questions, posted 1 Mar 2001 at 15:19 UTC by apm » (Journeyer)

I agree that developers have strong preferences - I certainly do. But, like most(all) developers, I have limited time/resources. So, while I agree that multiple language interfaces would broaden the audience, right now I think I have to give priority to the database itself, and to the Suneido language. But if someone wanted to work on an interface to another language, I'm all for it, and would support it as much as I could.

There are a number of issues with language independence that I don't know the answers to. If you're using it with, say Java, does that mean that database functions, rules, and triggers should be in Java, or would they still be in the Suneido language? Another issue is that many(most) standards for database connectivity are SQL oriented. So do you have to write an SQL to Suneido query language translator? I'm not saying these are insurmountable problems - just issues that would have to be addressed.

The history doesn't need to be kept forever. At a minimum the database needs history as far back as the oldest outstanding(open) transaction. But for recovery purposes history should be kept at least as far back as the last shutdown ("checkpoint"). There are limited methods for purging history at the moment, but obviously this is a requirement. The history is also available for application usage so that can also affect what you want to keep. For example, our version control uses the database history.

As far as the wisdom of combining the transaction log with the database - all I can say is that it has worked well so far. Performance-wise it has some definite advantages - outputting a new version of a record is simply that, appending the new version to the database, leaving the old one intact for recovery/rollback. No separate log output, no flushing logs (or database). Obviously, it means you can't put the log on separate physical media for recovery purposes, but there are alternatives such as mirroring drives or using RAID. And yes, it's not "easy" to do recovery with this design, but it's do-able. IMHO it's better if normal operation is simpler/faster at the cost of infrequent recovery being a little harder. (As long as it's still possible.)

History/log space is an issue, and currently does require periodic purging/compacting, which could be a problem in some situations. But for the applications we envision, it's reasonable.

Thanks for the feedback, it's important to have ideas/designs examined and challenged, since developers all have their blind spots. (Especially regarding their own creations!)

Andrew McKinlay
Suneido Software

from the perspective of web development..., posted 6 Mar 2001 at 08:45 UTC by turing » (Journeyer)

I can't see using a product like your Suneido DBMS, simply because I would then be tied to it. That's scary.

Right now I develop apps that can talk to oracle, postgres, mySQL, sqlServer, whatever. While I agree that SQL is limited and in many cases plainly stupid, I don't view evolutionary language changes as the "future"

I view native, node & attribute aware data management systems that store data as XML as "next generation"

I've seen a few cool things that will present SQL results as XML natively, but I have not seen truly native XML (that's fast and actually works in a production environment) yet. I'm frothing at the bit for it, though :)

In any case, for small embedded applications where the above isn't a consideration, I can see the system being quite useful. For example, integrating it into a desktop application that requires a database.

alex black turing

re. from the perspective of web development, posted 6 Mar 2001 at 17:39 UTC by apm » (Journeyer)

Yeah, being tied to a product is certainly one of the issues in "non- standard" products. It requires a certain "leap of faith" that doesn't necessarily come easily. But if you are going to get tied to a product, at least it can be open source (not e.g. Visual Basic). I guess the question is whether a product has sufficient advantages over the alternatives to justify the risks.

What scares me with XML databases, is that they seem awfully similar to hierarchical databases, which were largely abandoned in favor of relational databases some time ago. Personally, I can see more potential in object-oriented databases, with XML more a means of data exchange. But that's just my two cents. I'm open to the future.

Thanks for the feedback,

Andrew McKinlay
Suneido Software

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page