Home-grown management of 3rd Normalised Form Databases

Posted 11 Mar 2007 at 15:12 UTC (updated 11 Mar 2007 at 19:18 UTC) by lkcl Share This

SQL - Structured Query Language - an extremely powerful mechanism for the storage, retrieval and manipulation of data. I've been using it for years - but not on a particularly large scale (twenty tables, maximum).

This article seeks to emphatically draw the reader's attention to the stupidity of home-growing your own database access layer on top of a 3rd normalised form database design, even if it's a nice object-orientated one, and why you should avoid stored procedures at all costs, and even, to some extent, Views, when you have particularly large 3rd normalised form databases.

What is the problem?

3rd normalised form database design is where every table has a unique id field, and where every foreign table reference is made by id and by nothing else. For example, in a 3rd normalised database design of address and people, you never refer to the people in the address table by name, but only by the id of their record.

The problem is this: adding, editing, updating and accessing the records is complex. You therefore have to write code that manipulates the records "one level up". Also, you can never be sure that what you are dealing with is entirely correct, because if you look at the tables separately, all you see is numbers. no names. no clues. no way to debug.

But, as I've found out: it's much worse than that - a lot worse. In fact, it was incredibly naive of me, an experienced programmer, to wander in blithely to the task of dealing with a 3rd normalised form database, and expect that hard-coded functions to manipulate data would be a "good idea".

So this article will outline some of the lessons that I've learned, and some of the tricks that, if you don't believe they are necessary, then, putting it bluntly, you are quite simply a total failure at being able to assess projects, and you should quit your job and go and do something else, like flower arranging, because otherwise you will be a complete liability to the project you're working on.

Also, I'm sorry to have to say this to Free Software people, but the Microsoft .net data access framework provides the level of required functionality to correctly manipulate data, and free software doesn't. Anyone assessing free software for the manipulation of large data sets (100 or more 3rd normalised form tables, and gigabytes of data) should think twice, unless they have access to extremely competent programmers and/or at least a 48 man-month budget for design and coding.

Lastly, I should particularly point out: it doesn't matter what programming language you program your SQL access layer in: it's GOING to be complex. You should therefore design (or find) an access layer that is in the best possible programming language that you can find. And, I'm sorry to have to inform people who love SQL stored procedures this, but stored procedures is a shit way to deal with the level of required complexity. Not only that, but you have to have seriously competent programmers in SQL stored procedure language - a language which is completely ill-equipped to deal with the required complexities, but also you have to coordinate their efforts very closely with the rest of your programmers. Forever.

Why is this so difficult?

There are many reasons - but here's one to illustrate. Let's take an initial table, which then had to be extended somehow. But, for various reasons (not least of which could be that the initial table is multi-purpose i.e. it's like a Base Classs, and therefore cannot be edited) you have to have an "extension" table (like a Derived Class) that refers to the initial table, with a one-to-one mapping.

Let's suppose that there are existing entries in the "Base Class" table, which you now need to "extend". Let's call the initial table "Person", and you have lots of "people" in the table already, and now you want to extend that to "Employee". in the "Employee" table, you want to put which company they work for, what date they started etc.

So, the most sensible thing to do is to do a LEFT JOIN between the "Person" and the "Employee" tables. Also, let's assume that each person's name (or, better, social security number) is unique. So, from a usage perspective, you never look up a person by name - only by their SSN. Your security division swears blind that the Person table was legitimately procured, but the fact that it contains 200 million entries, and there are only 50,000 Employees in the company, leaves you a little suspicious.

Regardless: that's the setup. The Person table is pre-populated (mostly inaccurately...), and you have existing employees to manage, and new ones to add. Now we describe the problem. The issue is: when adding a new person across these two tables, what exactly is required? Well - the procedure goes something like this:

  • First, do a LEFT JOIN against the Person and Employee tables, looking for an existing record (remember, this is two tables: you can't just do a straight REPLACE INTO).
  • If a record was found, then do an UPDATE into the Person fields, rather than an INSERT, of those fields that, in the input data, match with the Person fields.
  • If in the Query, the Employee id field is NULL (and so will all of the other Employee fields, but the important one to test is the Employee id field) then you know that there was an existing Person record, but no Employee record. Under these circumstances, rather than doing an UPDATE (on a non-existent Employee record) you have to do an INSERT. Oh - but remember to grab the correct Person id from the query you just made!

And that's only if there are two tables. If there are three JOINed together with LEFT JOINs, then you have to repeat the procedure AGAIN, just to make sure that there might not be existing data which could end up now being successfully JOINed in (remember: queries don't just go against id fields, they can be against anything).

It gets worse, and here's why.

Imagine that there's a scenario where you cannot use Primary Key Indexes. More specifically, to make a record unique requires conditional indexes, that cannot be done using SQL: they can only be done in a programming language.

Conditionally Unique Indexes

Unique Indexes in SQL databases are typically the way to avoid data being accidentally added that is supposed to already be there. So, you can't end up with the same "Manager" doing more than one "Job", for example. (where reality meets naievity in database design you often get conflicts. resolving the conflict in this example is left as an exercise for the reader - but - and here's a clue: you're not allowed to shoot the people who pay your wages).

So, what happens when you want to keep, for example, history information about a Manager's career? Instead of having a "Job" table which has a single "Manager ID" field, you instead have a third table - MANAGER_JOB_REF - which stores not only the ID field of the "Mananger" and the ID field of the "Job", but also the date stamp when the "Manager" started the job, and a date stamp when the "Manager" ends the job - which can be NULL to indicate "they're still in the job".

Now you've moved into a whole new ballgame of complexity in your 3rd Normalised Form database, where you can't even create Unique Indexes across the ID fields (Manager_ID and Job_ID) of the MANAGER_JOB_REF table. Why? because a Manager might temporarily have two jobs as they transition between roles in the company. In fact, for quite some time, they might end up having the roles, and, realistically, if a company down-sizes, they might end up with two roles, permanently (but we'll leave that issue aside as the exercise for the reader, referred to above, where you're not allowed to shoot your boss. or go mad and claim sick leave.)

So, we have a nice complex conditional index, where people are identified uniquely like this:


  SELECT * FROM
  Person
  LEFT JOIN Employee ON Person.id = Employee.person_id
  LEFT JOIN Manager ON Employee.id = Manager.employee_id
  LEFT JOIN Manager_Job_REF ON Manager_Job_REF.manager_id = Manager.id
  LEFT JOIN Jobs ON Manager_Job_REF.job_id = Job.id
  WHERE
  (Person.SSN = '667051111') AND
  (Manager.Level = 'Numpty Timewasting Level') AND
  (Manager_Job_Ref.end_date IS NULL) AND
  (Manager_Job_Ref.start_date > '21 Jun 1954')

Anyone spot the deliberate mistake? anyone? No - it's nothing to do with the Manager being a Numpty, nor with the fact that they've been in the job far too long. Here's a clue: it's to do with the LEFT JOINs. This query, even if the Person exists, and in fact even if the Employee record exists, will still not appear in the query if there is no Manager record or even if there is, if there's no Manager_Job_REF record - because of the WHERE clause tests.

We missed out NULL checking, required due to the LEFT JOINs.

Here's the actual query required to do the job correctly:


  SELECT * FROM
  Person
  LEFT JOIN Employee ON Person.id = Employee.person_id
  LEFT JOIN Manager ON Employee.id = Manager.employee_id
  LEFT JOIN Manager_Job_REF ON Manager_Job_REF.manager_id = Manager.id
  LEFT JOIN Jobs ON Manager_Job_REF.job_id = Job.id
  WHERE
  (Person.SSN = '667051111') AND
  (
     (Manager.Level IS NULL AND
      Manager.id IS NULL)
     OR
     (Manager.Level = 'Numpty Timewasting Level')
  ) AND
  (
     (Manager_Job_Ref.id IS NULL AND
      Manager_Job_Ref.end_date IS NULL AND
      Manager_Job_Ref.start_date IS NULL) 
     OR
     ((Manager_Job_Ref.end_date IS NULL) AND
      (Manager_Job_Ref.start_date > '21 Jun 1954')
     )
  )

To explain clearly: the LEFT JOINs result in the Manager.id and all Manager fields being LEFT JOINED being NULL. But, ANSI SQL requires that if a field is NULL, then any comparison operations such as equals greater or less than must all return false. So, you have to explicitly code in a test for NULL in order to make the LEFT JOIN return the NULL Manager fields.

But, as you can see: the WHERE clause doesn't end there - the tests also include Manager_Job_Ref, and so you have to test for NULLs there as well! But, it doesn't end there - as you can see from the Manager_Job_Ref.end_date example, you could actually have a valid test (if this was INNER JOINs) for NULL values - remember, in the example, we said that if the end_date was NULL, then the Manager was still in the job.

So we have to explicity include a test which uniquely identifies whether the table being LEFT JOINed is empty / missing, and for that, we have to check that the primary index field is NULL as well (e.g. Manager_Job_Ref.id)

This looks insane - but I kid you not, it's absolutely necessary - and we've only just scratched the surface. I forgot to mention: when you start doing LEFT JOINs, then anything that you would normally need to be INNER JOINed must be converted to LEFT JOINs, because otherwise, entries will be missed... but that you should already know about, as a good SQL designer.

Lookup Tables

One final little "trick" that's helpful for lookup tables. In 2nd normalised form, lookup tables are done by storing the name in the table - effectively treating the database as a glorified spreadsheet, and thus wasting enormous resources. In 3rd normalised form, you end up instead with the enormous inconvenience of storing id fields which you then can't debug properly - unless, of course, your SQL access layer performs the translation for you.

So, let's describe a perfectly reasonable scenario where lookup tables take on a whole new level of insane complexity underneath, in 3rd normalised form, that look perfectly acceptable on the outside, to ordinary sane people used to dealing with 2nd normalised form.

We'll extend the numpty Manager example with three further tables. The first table is "Boss Hierarchy" and the second and third is the "Corporate Structure".

The "Boss Hierarchy" table is called EMPLOYEE_EMPLOYEE_REF and it contains fields "employee_id" and "my_boss_employee_id". The other table is... let's say that it contains which part of the company employees are in: e.g. the "Arizona Office", the "Numpty Division", the "Blue Cheese Moonies Office". (personally I think the Arizona Office is the funniest one, but my sense of humour is pretty sick). Finally, we need a CORPSTRUCT_EMPLOYEE_REF table to link employees to corporate divisions, with field names that are self-evident.

Also, we'll add that, for convenience, 'Employees' have a link to their division head stored in the Employee, by id of the 'Person'. This is a little contrived, and there are better ways to represent the relationship with a better design, but it allows us to illustrate the issue.

We now want to present a "view" of this data which looks like a 2nd normalised form, where we can pass in data to be changed that contains things like "this employee named 'fred' has moved to the Blue Cheese Office and the boss of that division is 'Mr Insanity'", instead of "this employee with id 21767 that's in the Person table not the employee table" has moved to the corporate division with id 107, and corp division with id 107 is controlled by Person who is known as 6, Number 6".

So - let's write those out, roughly as pseudo-SQL 'create' statements:


  CREATE TABLE Person (
     id int, # unique index
     name varchar,
     stolen_ssn_number varchar
     );


# table that extends Person CREATE TABLE Employee ( id int,# unique index person_id int, division_head_id int started_date datetime, fired_from_cannon_date NULL datetime );

# table that extends employee CREATE TABLE Manager ( id int,# unique index employee_id int, );

CREATE TABLE CorpStruct ( id int, # unique index name_of_office varchar; );

# table that links employees to their boss CREATE TABLE EMPLOYEE_EMPLOYEE_REF ( id int, # unique index employee_id int my_boss_employee_id int, );

# table that links employees to a corporate office CREATE TABLE CORPSTRUCT_EMPLOYEE_REF ( id int, # unique index corpstruct_id int, employee_id int );

CREATE FOREIGN KEY ce1 CORPSTRUCT_EMPLOYEE_REF.corpstruct_id REFERENCES CorpStruct.id; CREATE FOREIGN KEY ce2 CORPSTRUCT_EMPLOYEE_REF.employee_id REFERENCES Employee.id; CREATE FOREIGN KEY ee1 EMPLOYEE_EMPLOYEE_REF.employee_id REFERENCES Employee.id; CREATE FOREIGN KEY ee1 EMPLOYEE_EMPLOYEE_REF.my_boss_employee_id REFERENCES Employee.id; CREATE FOREIGN KEY me1 Manager.employee_id REFERENCES Employee.id; CREATE FOREIGN KEY pe1 Employee.person_id REFERENCES Person.id; CREATE FOREIGN KEY eb1 Employee.division_head_id REFERENCES Person.id;

You might already start to be getting a sinking feeling about this one already - but don't worry: it really is as bad as it looks, but I spent nearly four weeks thinking this through, making mistakes, so that you don't have to (hurrah!).

The issue is this: you need to translate the association between the "Corporate Structure" table into a "Person", in order to correctly display the boss of the Division. It's a horrendously complex lookup, but it is a lookup nonetheless. And all you want to do is show that "fred" answers to "Mr Insanity" in the "Blue Cheese Office"!

What I did was to do a recursive lookup, down the Foreign Key relationships, to translate the (contrived) division head Person id into the "Corporate Structure" office. You have to link from Person to Employee, from Employee to CORPSTRUCT_EMPLOYEE_REF, from CORPSTRUCT_EMPLOYEE_REF to "CorpStructure", and finally you get to "lookup" the name "Mr Insanity" as being in the "Blue Cheese Office".

Of course, you could just put the "Division Head Honcho" id field into the "CorpStruct" table, and this would make life much easier. However, reality never meets sanity as far as Database Design is concerned and you may have to just live with the existing database design. Plus, both suggestions don't take into account the "Conditional Indexes" syndrome, where Division Heads come and go and you might want the database to record who to blame or sue (or take out a contract on, or even offer a nice fat bonus: if you work for Enron - you never know what's going to happen). Just in case something is discovered ten years down the line.

So you can't just expect "lookup tables" to be nice and straightforward - you can expect to have to jump through a couple of "anklebone connected to da.." hoops in order to put the right strings into that nice drop-down menu you were told to add to the edit page.

Insanity by Design

I haven't covered everything - all of the required tricks, but I hope I've impressed upon you by now quite how insane it would be to have to reimplement, ten, fifteen - heck, even twice some of the logic above, and I hope that it illustrates to you why it is necessary to have a decent database abstraction layer. If you work for a big corporation, such as Dow Jones, then you already know what I am talking about, but your Non-Disclosure Agreement precludes you from being able to tell anyone about it - plus, you make a hell of a lot of money from being able to keep your job as complex as it really is, and you'd prefer to keep receiving the nice bonuses.

As a "lazy" Free Software developer, I have difficulty in doing the same thing more than about twice, especially if I know that I might have to do it again, and so therefore want to go out of my way to write code that will make it easier the next few times. But even I had no idea just quite how insane 3rd normalised form database management really is.

The solutions effectively call for "Writeable Views". Every decent database (that means excluding MySQL, whose designers should have been put down at birth for not deciding to do ANSI SQL 92 compliance from the word GO) should have the concept of "Writeable Views".

Microsoft .net provides a work-around for this lack of SQL writeability, via its "recordset" abstraction layer, where you can merge two datasets together, and all the appropriate deletes, updates and inserts will be handled for you. Microsoft SQL 2000 endeavours to provide simplistic "write" access to views - but the restrictions on it are way below the required level of functionality as partly outlined in this article.

Also, you should by now have a clear idea why I am so against the use of Stored Procedures. To even attempt to implement an abstraction layer in a programming language that was designed in the 80s, which doesn't cope with object-orientated principles or even optional parameters correctly without horrendous workarounds, would not be insane, it would be stupid. At least if you're insane, you probably stand a good chance of succeeding, whereas if you're just stupid enough to try implementing a Database Abstraction layer in a SQL programming language, you're probably not intelligent enough to pull it off.

Either that, or you are deliberately trying to make yourself indispensable by pulling the wool over the eyes of your Management, which is much worse.

Whatever the reason: don't do it. Spend some time researching or writing a decent 3rd Normalised Form database abstraction layer, in a proper object-orientated modern programming language, and stay the hell away from hand-coded hard-coding of database access functions. Oh - and don't be fooled into thinking that Views will make your job easier, either, because Views are only half the story (read-only...) and now you have to maintain the Views forever, as well as your code...

In short: good luck...


yes, no, maybe, posted 11 Mar 2007 at 17:42 UTC by nixnut » (Journeyer)

I'm not quite sure what you're trying to do, since I find it hard to visualise from a written text like above. Anyway, it seems you are not very comfortable with sql. I'm fairly sure you could handle some of your problems with subselects or simple split the task in smaller task that each use a seperate and less complicate sql. Generally, you want to do the database as much of the work as possible, but that does require a solid understanding of relational databases and sql. One of the most important principles is thinking in sets. Sql is all about selecting and manipulating sets of data. So sets, intersecting sets and nested sets are the things you would be looking for in a problem. Maybe if you can restate the task in manipulation of sets, you may get a clearer idea of the sql you can use to do that. Reading a good book on the subject may help some too. I found SQL for Smarties by Joe Celko pretty good. Of course having an experience db developer on your team is even better.

maybe erm maybe, posted 11 Mar 2007 at 18:19 UTC by lkcl » (Master)

nixnut, hi,

i can't explain yet why i wrote the above article - but i would consider myself to be a reasonably experienced database developer. i've specifically avoided sql sets, views and stored procedures, because using them makes the job of writing a generic, fully-featured and orthogonal database abstraction layer more complex, not less (by orthogonal, i mean an abstraction layer that effectively implements a fully functional sql "writeable view")

that having been said, i have encountered certain types of queries which cannot and i mean literally cannot be expressed in one statement, and you have to use sub-selects; and i have encountered situations where you absolutely must make multiple queries, too.

the first scenario is when you use HAVING and GROUP BY clauses, and then want to use WHERE clauses on the grouped results: it's simply not possible, and you have to contain the HAVING / GROUP BY in a sub-select and perform the WHERE on the outside.

the second scenario is where you want to return lists of lists as your results. this may be due to limitation of my knowledge, but i would be very surprised if there was a way to do it (return lists of lists).

imagine that you want to return the options - all of the options - from a multi-select box. think HTML: your cgi script library parses the FORM POST, which includes the usual inputs, and one multi-select dropdown box. you end up with a set of inputs, by name, and one of those inputs is the list of selected options.

now try storing that lot, in one SQL INSERT, into a database. you can't: sql cannot, as far as i know, do an INSERT of a list of items: you have to do multiple INSERTs.

equally, when you make a query returning a cursor, you can only return basic datatypes - string, integer, float, date etc. you certainly can't get back a list as one of the basic data types.

and that's another area which is missing from SQL, that your database abstraction layer has to cater for...

The Subject: when good programmers write bad SQL..., posted 12 Mar 2007 at 08:29 UTC by lkcl » (Master)

*lol* i got this, today - my first _useful_ message via my web site's new pyjamas.pyworks.org "email me" button HURRAH!




you were right about WHERE conditions messing up the LEFT OUTER JOINs, but you were not right about how to deal with the problem

here's the actual correct way to do the job correctly:

SELECT please , stop , using , the , dreaded , evil , "select star" FROM Person LEFT JOIN Employee ON Employee.person_id = Person.id LEFT JOIN Manager ON Manager.employee_id = Employee.id AND Manager.Level = 'Numpty Timewasting Level' LEFT JOIN Manager_Job_REF ON Manager_Job_REF.manager_id = Manager.id AND Manager_Job_REF.start_date > '21 Jun 1954' LEFT JOIN Jobs ON Jobs.id = Manager_Job_REF.job_id WHERE Person.SSN = '667051111'

teehee. i find this to be very funny. ok. i used the dreaded * because i couldn't be bothered to type out a load of parameters that didn't exist :)

the above came from someone who supplied a name but not an email address so i would love to be able to get some more feedback but can't ask quickly for it - just have to go via here.

what i have used (with the extra additions to the WHERE clause) "works" (and i have the test cases to prove it). i believe that the two SQL statements are equivalent, but that my one could be a bit inefficient.

huh. why didn't it ever occur to me to extend the ON clauses?? :)

The Plan, posted 12 Mar 2007 at 22:27 UTC by lkcl » (Master)

just to clarify, nixnut: i am always ambitious - it gives me something to keep me from being bored. given enough time, and encouragement (where money equates well to encouragement but isn't the be-all or end-all) i tend to actually succeed.

The Plan (tm) is to create a database abstraction layer that can cope with 3rd normalised form databases and also 3.5th normalised form (i.e. not quite 4th normalised but pretty close!) which does everything in a single SQL statement, where possible. except for multi-drop-down selections (lists of lookup entries to be associated with an item) of course.

i never liked the idea of splitting things out into separate sql queries - it's one of the things that i'm not keen on about SQLobject, but other than that i absolutely love SQLobject.

especially i never liked the idea of splitting things into separate sql queries when it's possible to do things in one, if you get the coding right.

RE: The Plan, posted 14 Mar 2007 at 18:50 UTC by nixnut » (Journeyer)

Yeah sqlobject is pretty nice. Single SQL statements are not a holy grail imho. There are cases where it's simple better for performance to split things up a bit. Or simply because often seperate sql statements are just that much easier to understand and maintain. Then there's the risk that you may end up with a reimplementation of a dbms in the layer that's supposed to be an abstraction layer.

Maybe the Semantic Object Model can help you a bit. For object oriented stuff it's a more natural fit than ER modeling I think. Just another way of looking at the data model, but it may give you ideas about how to abstract data objects. Then again... maybe not :-)

semantic object model, posted 14 Mar 2007 at 21:53 UTC by lkcl » (Master)

just to check with you: does the "Zachmann Framework", with its matrix of "who, what, where, when, why, how" and "Strategist.... Worker" constitute a type of semantic object model?

or is it simpler than that - and just a basic thing of "observer, observed, process of observation" (i.e. subject object predicate) ?

so you have items (rows in a table) and you have a predicate (subject "relates somehow to" object) which can be implemented as a two-way-reference table with a subject_id and an object_id

and then you can use that to relate "properties" to a row, in order to extend it, right?

so "Address Line 1" "is a property of" "The Person Named Fred" - subject predicate object.

is that basically it?

sql templates, posted 14 Mar 2007 at 23:45 UTC by lkcl » (Master)

ok.

regarding the "efficiency" side:

MSSQL 2000 has the concept of templates. templates are basically pre-compiled sql statements, where all you do is pass in the list of parameters.

templates make it impossible to perform sql injection attacks: any sql attack attempts will simply be passed verbatim through as parameters.

the other advantage of templates is the pre-compilation bit: many of the optimisations and certainly the parsing will have already been carried out.

Vietnam of Computer Science, posted 15 Mar 2007 at 07:03 UTC by slamb » (Journeyer)

My favorite sound bite on the subject: Object/Relational Mapping is the Vietnam of Computer Science.

Have you looked at SQLAlchemy? I'm not going to say I like it - I won't say that about any ORM - but I greatly prefer its design goals to SQLObject's. (I generally prefer hand-coded SQL, with some tools like my own Axamol SQL Library to avoid unpleasantness. In other words, in the terminology of the Vietnam article, my applications fall into "abandonment" or "manual mapping".)

I haven't done much with databases in a while, but in a previous job it was all I did. In fact, I implemented something rather like your example.

When creating/manipulating records, I never made complicated chained LEFT JOIN expressions like the one you presented for adding a person. I just built the person entry, then the employment entry, then whatever supplementary data it has. (Often all within the same transaction.) If you're worried about efficiency, you're wrong:

  • If you've just got one person to deal with, you don't have an efficiency problem. Do it however's easiest.
  • If you're dealing with many people, it's more important to handle many rows at once than many tables. (With n rows and m tables, typically n >>> m ~ 1. O(n) operations is not much better than O(m*n), and O(m) is not much worse than O(1).) Use bulk operations: See which of your people are already there all at once, add the missing ones all at once, update the present ones all at once, then move on to the next table. And where possible, don't even send data you're manipulating back to the application - just do it in place.

ORMs tempt you to do things like this:


# everyone gets a raise!
for person in Person.objects.all():
    person.salary += 500

but that probably generates 2n round trips to the database (commonly a minimum of ~10ms, plus processing time for intensive things. 10000 employees => 3 minutes). Besides, it's probably not atomic. The smart way is atomic, takes one round trip, and surely completes in under a second:


update person set salary = salary + 500

(There is one way ORMs can be more efficient - they can implement nice caching layers...so even if your queries are stupid, you're doing them infrequently.)

I did often set up joins in updatable views like yours (they work well with RAD tools like GNUe Forms[*]), but the extra data would be read-only and carefully chosen to not increase the cardinality of the view. I.e., I might grab fields from an inner join against the 1 side of an N-1 join, a left join on a 1-0, or aggregates on a 1-N...all for display only. If I wanted to update the other tables, I wouldn't use some other table's view to do it.

When grabbing a bunch of data for processing/display/analysis/reporting, I used all sorts of complex views and joins and subselects and aggregates. Generally coded by hand, and often to generate a pretty report through something like Jasper Reports. This is where SQL really shines - doing the same manipulations in appl ication code would have been horrendously less efficient for both the machine and for me.

I used stored procedures, but my reasons are largely irrelevant today. I would never attempt to create an ORM with them. They're beneficial as a way to enforce complex security constraints on users logging in directly. But nowadays most database logins are from pooled-connection, trusted web applications rather than end-user accounts, so you do all security application-side.

[*] - In the spirit of the site, I've linked to Free alternatives to the proprietary tools I used with.

sqlalchemy, posted 15 Mar 2007 at 18:58 UTC by lkcl » (Master)

please ignore this: it is a _major_ rant.

sqlalchemy is being developed by a bunch of not-invented-here tossers who cannot be bothered to look at and understand sqlobject.

sqlalchemy can be replaced by about three weeks work of improving sqlobject, and you're done.

instead, the stupid fuckers are _reimplementing_ sqlobject, which, so far, has had about... i would guess... five man-years of development put in to it, and has had several man-decades of actual testing, and production use proving that it works.

the sole and exclusive difference between sqlobject and sqlalchemy, from a user's perspective, is that in complex WHERE clauses, you have to use SQLbuilder: import SQLbuilder and then do SQLbuilder.OR(SQLobject1.field1 == 'fred', SQLobject2.field2 == 5) will result in a sql statement SELECT SQLobject1.field1, SQLobject2.field2 FROM ... SQLobject1, SQLobject2 ... daa da daaa ... WHERE (SQLobject1.field1 = 'fred') OR (SQLobject2.field2 = 5)

other than that, i'm pretty pissed off with the sqlalchemy dickheads because SQLobject provides transaction-journal-safe cacheing, transactions/commits, understands SIX or more different database backends one of which is ADO and sqlalchemy provides about three.

but please do ignore me for ranting.

axamol. YUKK!, posted 15 Mar 2007 at 21:18 UTC by lkcl » (Master)

slamb - please don't take this personally: i'm criticising axamol from the perspective of having worked nearly full-time for six months now on something so horrendous that my average lines-of-code-per-day rate is now down to below 15.

axamol's goals appear to be nothing more than what can be achieved by using " WHERE param1 = '%{name}s' AND ..." % {'name': 'fred', 'description': 'nothing'} in python strings, with a little bit extra besides, like testing for NULLs, which could probably all be done using HTMLTMPL.sf.net if you didn't mind the god-awful xml-like syntax.

with the additional confusion that it's now specified with an XML syntax - which i oh GOD i am so glad i did xvl (xml virgule language) back in 2000 it SO taught me never to use xml as a programming language. (at linuxcare's expense xxxx me they owed me a big enough xxxxing favour)

oh - so my suggestion of using htmltmpl wouldn't be so bad after all, ha ha :)

Not enough intelligence applied, posted 15 Mar 2007 at 21:29 UTC by lkcl » (Master)

slamb - ORM being the "vietnam of sql" - is a _fascinating_ concept.

i especially like the "surely it's better to carry on than to back out, now" idea.

this assumes an insufficient degree of intelligence on the part of the people involved with the project [which would explain why vietnam was such a goddamn mess]

for anyone who may be interested, i recommend the exploration of "adivita vedanta" and things like this:

http://www.maharishi.org/overview/vedic_science.html

note how closely this stuff fits with "observer, observed and process of observation" and with quantum mechanics.

note also how 2nd and 3rd and successive orders of differentiation of quantum state functions fit with "strategy" or the "how" in the zachmann framework

i am _constantly_ re-evaluating what i have done - and i have the intelligence and confidence to say, at each step where i run into difficulties, "yep - i'm still doing ok so far" :)

the reason i mention this in the context of the sql thing is that everyone is in a mess because nobody will add "write" capability to sql views in their database because it takes six months to work through the required concepts, and then have the confidence or time to then bite into the source code of a free software database server (i'm certainly not going to help mysql out!)

anyway. i'm rambling - and should go to sleep. goodnight :)

Axamol SQL Library, posted 16 Mar 2007 at 06:28 UTC by slamb » (Journeyer)

Yes, Axamol SQL Library's scope is pretty small, and it'd be less code in Python.

It is a bit more complex than you say - your "select '%(foo)s' % {'foo': 'asdf'}" doesn't use prepared statements. It's slow and vulnerable to SQL injection attacks. You can't use just use an existing generic templating language to make the query, either, because you have to build the list of binds as well as the string.

The XML was because I didn't want to implement my own grammar. Doing that well takes effort. Most people who implement their own grammars do a really bad job of it - lots of things impossible to represent (bad quoting behavior), bad Unicode behavior, as well as having the disadvantage of obscurity - no automatic syntax highlighting or structure check in my text editor, etc. So while you and many others hate XML, it does have good qualities.

MSSQL templates, posted 16 Mar 2007 at 18:05 UTC by lkcl » (Master)

slamb, you are right about the syntax highlighting. (then again, your programming language of choice also has syntax highlighting.)

mssql has the means to upload templates (somehow - never done it) to the database, such that you can instead pass in the parameters to that template: it solves the sql injection problem, at least.

in the end, it entirely depends on whether you are using your database for near-2nd-normalised-form (effectively as a spreadsheet), 3rd-normalised-form, or 4th-normalised-form.

ironically, the form that most people use is 3rd normalised form - which is the one that causes the most grief.

and 4th normalised form without cacheing is Hell On Earth, but at least it's _simple_, with the right database abstraction layer, which you only have to code once and not worry about, ever again.

this all tends to suggest that SQL needs a _significant_ overhaul / upgrade.

Godel Escher Bach, Page 656, posted 17 Mar 2007 at 12:21 UTC by lkcl » (Master)

"When it is hard to build a template because the preprocessor finds too much diversity, that should serve as a clue that concepts on a higher level of abstraction are involved that the preprocessor knows nothing about"

this is an absolutely brilliant lesson to know about programming: first understand your concept. then model it. then use that model to improve your understanding. lather, rinse, repeat until success or bankrupt.

all of which illustrates why proprietary software will always fail to satisfy.

update, posted 28 Mar 2007 at 13:11 UTC by lkcl » (Master)

i've since realised what the problem is: it's that SQL is a language. therefore, i need a SQL parser, interpreter and compiler, to complete the work i wish to achieve.

no _wonder_ people freak out at this :)

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page