Python Database Access

Posted 27 Dec 2002 at 00:03 UTC by lkcl Share This

After developing the fourth python / SQL / HTML based application in a row, I decided to abstract the job of accessing SQL databases a few layers, using python classes.

the results can be seen in a package called Custom, which contains the libraries and also some example usage.

SQL consists of statements to insert, update, delete and alter tables. The syntax is close to ordinary english, and is consistent enough to merit auto-generation functions to read or create SQL syntax. variants of SQL are similar enough for ODBC to have been created, and also to create database abstraction layers.

So, I set out, eighteen months ago, to create some python classes to manipulate SQL, supporting as many types of SQL databases as i needed to access (MySQL and MS-SQL 2000). starting with the statements that create WHERE, FROM and table rules, i ended up with functions on a per-SQL-table basis that can insert, update and delete from SQL database tables.

in the end, the top-level functions are so simple and structured that it became worthwhile to write auto-generating code. the auto-generating code, sqlgen.py, takes a SQL file with INSERT, CREATE and ALTER TABLE FOREIGN KEY statements and generates all the functions needed to manipulate the database.

JOINs were a little tricky. JOINs are what makes good SQL databases useful and manageable. however, by doing JOINs, you end up having to write quite complex SQL queries. the techniques i developed help avoid all the expected complexity, constructing the SQL queries on your behalf.

the db_whererule_convert function takes a list of WHERE rules that need to be ANDed together. take the list [('name', "%search%"), ("id", [1,2,3]) ] and because of the way that the function db_in_or_equal works, the result is: WHERE 'name LIKE "%search%" AND id IN [1,2,3] AND ....'.

AND was chosen rather than OR because it is more common to make rules using AND. any other rules (such as NOT, or OR) can be done by making one of the items in the list a string, and you have to do the rules yourself. out of approximately 150 functions in the example program, custom, only five to ten required extensions to the WHERE clauses that could not be covered with simple usage of db_whererule_convert.

def db_whererule_convert(whererule):
    """ turn whererule list into string.  ignore empty strings.
    """
    if type(whererule) != type([]):
        whererule = [whererule]

l = []

for i in whererule: if type(i) == type(''): if i is not None and len(i) > 0: l.append(i) elif type(i) == type(()): (key, val) = i l.append("("+db_in_or_equal(key, val)+")")

return join(l, " AND\n\t\t")

once we have the whererule conversion and also an AS convert function, it becomes possible to create a complete SELECT statement:

def table_open(self, table, selectdict={'*':None},
                 whererule=None, sortrule=None):

whererule = db_whererule_convert(whererule) selectrule = db_as_convert(selectdict) table = db_as_convert(table)

# create sql select command command = "SELECT "+selectrule+"\nFROM "+table if whererule is not None and whererule != "": command += "\nWHERE (" + whererule + ")" if sortrule is not None and sortrule != "": command += "\nORDER BY " + sortrule

return self.__db_execute(command)

then, we have a function that constructs the list of SELECT rules, a dictionary of fields to return, a list of tables and some SORT rules. the db_wheretype function takes a dictionary that contains types as keys and field names as values. in this way, if the id is an int, long, string or even a DateTime or a weird class, the db_wheretype function selects the correct field name.

def invoices_start(self, id=None, from_id=None, cust_id=None):
    # rules
    l = []

if id is not None: l.append(db_wheretype(self.invoices_id_fields, id)) if from_id is not None: l.append(db_wheretype(self.invoices_from_id_fields, from_id)) if cust_id is not None: l.append(db_wheretype(self.invoices_cust_id_fields, cust_id))

# fields fields = {'invoices.id': 'invoices_id', 'invoices.log_id': 'invoices_log_id', 'invoices.from_id': 'invoices_from_id', 'invoices.cust_id': 'invoices_cust_id', 'invoices.ref': 'invoices_ref', 'invoices.status': 'invoices_status', 'invoices.bank_ref': 'invoices_bank_ref', 'invoices.createdate': 'invoices_createdate' }

rules = DPyDBrules(['invoices'], fields, l, "invoices_status, invoices.createdate")

return self.__dpydb.db_open(rules)

finally, we come to example usage. Andy Dustman's MySQLdb code returns a cursor, which has a fetchone() method that the display_table() function uses to fetch... er... one row from the query, and... er... displays it.

def display_invoices_info(self):

cur = self.dpydb.invoices_start(id=self.id, cust_id=self.cust_id)

fields = [ ('invoices_id', "InvID", invoices_id_disp_fn), ('invoices_cust_id', "AcctID", customers_id_disp_fn), ('invoices_ref', "Cust Ref", None), ('invoices_id', "Cost", invoice_cost_disp_fn), ('invoices_status', "Status", self.invoice_status_fn), ]

return self.iso.display_table( cur, html_action, "Invoices", fields)

so, i realise that this looks like quite a lot of work. however, the construction of rules - especially where JOINS are involved - can be done by adding in extra rules into an instance of the DPyDBrules class. LEFT joins are a small pain that require more thought on how to integrate properly.

in all, i guarantee you that the amount of code writing (SQL statements) is drastically reduced, with only a few hundred lines of boringly regular python code to write (or auto-generate).

in fact, the abstraction is almost at the point where an XML or LDAP back-end could be added rather than a SQL back-end, with the minimum of fuss.

last but not least i would like to thank my wife for her drunken assistance and harassment that helped me to write this article. teehee.


I have one of these too, posted 27 Dec 2002 at 18:37 UTC by sab39 » (Master)

I've been writing a tool like this for about 3 years now.

I'm trying to advocate within my company for it to be released as open source, which you'd think would be reasonable since I wrote it myself essentially in its entirety (albeit partially on company time, although probably 60-70% on my own time). It's output-language independent (C# and Java currently supported, but new languages are pretty easy to add) and has specific support for Oracle and MS SQL Server. Not only that, but as well as creating the tables themselves, it can also do "alter table add column" or "alter table drop column" (and add and delete primary keys, indexes, foreign keys, etc) if you change the table definition and re-run the tool. And it integrates with CVS and with Visual Studio .NET.

I'm fairly proud of it, in case you can't tell that already. Of course, it's useless to *you* unless I can pursuade the company to release it... :(

Stuart.

integrates with cvs?, posted 27 Dec 2002 at 23:54 UTC by lkcl » (Master)

sounds pretty good - three years of "i need a tool to make my daily life easier" is a long, long time :)

what's it written in?

how does it integrate with cvs?

creating and altering tables wasn't intended to be part of the original spec for the sql management project, a la webmin with the mysql plugin.

definitely not.

what _was_ intended was that a sql database designer could turn the sql database into useable python code.

going a la webmin (effectively porting webmin from php to python) would be.. imo... too much for this project.

there are always tools _like_ webmin, which i believe your program sounds like it is similar, for doing off-line design and then run-time maintenance later on.

the focus of the pysqldb project is fast and flexible data manipulation at run-time, specifically in python.

more details, posted 30 Dec 2002 at 17:41 UTC by sab39 » (Master)

I guess I'll start from the ground up with some of the inspiration for the tool and go from there to how it works and why.

First of all (problem #1) I was faced with a similar situation to yours: fed up of writing the same code to perform gets, updates, inserts etc. Essentially for every table I wanted a class with static getter methods, fields corresponding to each database column, and update() and delete() methods (where update() performed an insert for items that weren't already in the DB). Thus I could write code like:

Item item = Item.getById(id);
if (item == null) item = new Item();
item.someField = someValue;
item.update();

(this code is a mishmash between Java and C# naming conventions because I work in both - the tool is sufficiently language-independent that python, perl etc would be easy to add).

Secondly (problem #2) I usually deal with systems that have a development environment separate from the live environment. The traditional way of moving changes from development to live has been to copy the code and then manually try to remember what changes were made to the database. As you can imagine, this is horribly error-prone and taking a certain risk with the production system.

Of course (problem #3) the database schema isn't stored in source control either, so it's impossible to roll back without going to backups, which may not be what you want.

In addition (problem #4) the development environment usually has multiple developers sharing a database, although (as is usual with CVS) they each have a working copy of the code. This isn't a problem as such, but the solution needs to deal with that reality.

Problem #1 could be solved pretty much as you did, reading the database schema one way or another and generating code (although you would still need to do some guessing or manual work to cover the fact that the database schema doesn't give enough information to identify all the Get methods you might want). However, it doesn't cover problems #2-4.

Basically, I figured that solving problem #3 would imply a solution to #2 as well. If the database schema could be somehow stored in source control with the capability of rolling back, it would also be possible to do a metaphorical "cvs update" on the live database to bring it to the latest code without any error-prone manual intervention.

Instead of trying to invent a source-control system for databases, I decided to go the other way: make the database definition actually *be* source code that can be stored in source control in the usual way. Thus I defined a format which evolved into something that looks like this:

table item {
  fields {
    int id integer notnull readonly [the ID field];
    string name varchar(50) nullable readwrite [the name];
    int other_table_id integer notnull readwrite [the id of another table];
  };
  pkey sequenced id;
  get multi {fields {name}};
  references single other_table {fkey; by {other_table_id id}};
};

Given this file format, I wrote a tool that performed both the tasks I was interested in: it talks to the database and performs the necessary CREATE TABLE commands to construct the table itself (and, on oracle, creates a sequence and trigger to implement the sequenced primary key - on SQL Server it just makes the field an IDENTITY). And then it also generates source code in the language of your choice defining the fields and methods necessary (a property with getter/setter methods for each DB column, a getById method from the primary key, a getByName method for the "get multi", and a getOtherTable() method from the "references". The contents of the "get" can end up much more complicated than that, with facilities for joins, arbitrary parameters, orderby clauses, and arbitrary SQL in the WHERE clause - it covers almost anything you might want to do.

The integration with CVS is simply to solve problem #4: If person A adds a field FOO to table BAR and runs the tool, the column will be added to the shared database. If person B runs the tool without having A's modified definition for BAR, the tool would notice the existence of a field that shouldn't be there and drop the FOO column again. The CVS integration ensures this doesn't happen - a (horribly complicated :( ) system of locking and version-awareness allows the tool to be aware that B's copy of BAR is out of date, so it doesn't attempt to modify that table in the shared DB. (I should add that the horrible complexity is all under the hood - neither A nor B need to do anything special, and they'll be notified if they both try to modify the same table in divergent ways).

Oh, and the tool itself is written in Java (it should run just fine on Free VMs) and the output is (as I think I mentioned) language independent. I added C# recently and it took about a week of my spare time, much of which was figuring out how to achieve the results I wanted in C# code in the first place, and getting integration with Visual Studio to work (that's hell, btw).

oo!, posted 2 Jan 2003 at 14:24 UTC by lkcl » (Master)

tingly feeling of something really cool that i think you should really really see if you can get it open sourced.

i like the bit about the object access.

how would you deal with Item's getbyId(id) function returning more than one entry in the query?

or do you simply treat it as if it _is_ one entry?

also, a getbyID would have to record the conditions under which the SELECT succeeded, such that the SELECT criteria could then be used on the UPDATE.

... very very interesting. esp. the running commands on cvs commits.

i preferred to stick with a standard SQL file format on the grounds that not many people would want to learn a new tool file format.

i used the FOREIGN KEY statements, even though MySQL doesn't support them, to indicate JOINs between tables, and that seems to suit my needs.

i like the object idea (class Item) very much, i think i will have to see if i can make that work, if that's okay with you.

single vs multi gets, posted 2 Jan 2003 at 15:32 UTC by sab39 » (Master)

If you have a "get single" that includes only "fields", like this:

get single {fields {something_id; something_else_id}};

then the tool will create a unique index based on those fields, as well as creating the getByWhatever() function. If you have something more complex so that it's not possible to create a corresponding unique index, the code just assumes you'll only get a single record (I forget exactly what I did, but I think it will just return the first result and ignore whether there are any more - that saves one rs.next() instruction). If it returns zero rows, you get null.

If you declare a "get multi", you get back a standard list type in your chosen output language (List in java, ArrayList in C#) with all the returned items. Of course, in that case you can do a test of the number of results returned in the usual way for your language.

I'm going to talk to someone today to explore yet another option on how I can legitimately open source it. It seems like the attitude here is "we don't care whether you open source it, but you'd have to talk to the lawyers and we're not going to pay the money to do that". So the trick is to determine, one way or another, that it's "unquestionably" okay to do it in a way that the lawyers don't have to get involved...

Whoo, I got permission!, posted 2 Jan 2003 at 19:13 UTC by sab39 » (Master)

Turns out my boss had been reading Larry Lessig's latest book over the holidays and was willing to go on record as supporting the idea. I'm going to see if the Savannah folks will allow me to start up a project there (the current dependencies on non-free java and non-free databases make it iffy) and I don't have any kind of homepage or anything yet but you can get the tarball at http://rainbow.netreach.net/~sballard/nrdo.tar.gz.

There's documentation in the doc/ folder which might be slightly out of date but should be mostly accurate (there haven't been any significant file format changes since it was written, except for a few additional keywords in some files that you almost certainly won't actually need for anything.

I'm going to try to put together at least a rudimentary homepage shortly, and we'll see what the Savannah people say. I really want a public readonly CVS repository and public mailing lists, both of which would be so much easier to do if I let the Savannah people worry about maintaining them...

fantastic!, posted 4 Jan 2003 at 21:44 UTC by lkcl » (Master)

great!!!!

well you have a couple of other options - take a look at apache.org (ASF) you might want to consider assigning the code to them, but also take a look at sourceforge.net if you want to maintain it under a non-ASF license.

then there's always setting up your own server but that's only for people who have the knowledge and resources.

hey, i don't care, it's just great to see another project underway.

don't worry about the dependencies: if the code is interesting to enough people them i'm positive that people will run with it and see where they can go.

automatic detection of single / multi, posted 5 Jan 2003 at 13:45 UTC by lkcl » (Master)

If you have a "get single" that includes only "fields", like this:

get single {fields {something_id; something_else_id}};

then the tool will create a unique index based on those fields, as well as creating the getByWhatever() function. If you have something more complex so that it's not possible to create a corresponding unique index, the code just assumes you'll only get a single record (I forget exactly what I did, but I think it will just return the first result and ignore whether there are any more - that saves one rs.next() instruction). If it returns zero rows, you get null.

hey, hold one a mo.

it should be easy to detect whether a query will return a single or a multi result.

even on table JOINs.

if the selection criteria includes ANDs on each and every UNIQUE or PRIMARY key, then there's guaranteed to be only one result.

SELECT * from customers, attributes WHERE customers.id = 5 AND attributes.cust_id = customers.id AND attributes.id = 399

this is guaranteed to return only one result if customers.id and attributes.id are both PRIMARY keys.

i think... it is necessary to have a JOIN between one of the PRIMARY keys...

replies, posted 6 Jan 2003 at 18:30 UTC by sab39 » (Master)

The savannah people accepted the project so it's set up at http://savannah.nongnu.org/projects/nrdo

They wouldn't let me put my code in their CVS until the dependencies are fixed, but I can use their mailing lists and bug/feature tracking software. So I've done that and created a mailing list for it :)

As far as single vs multi, it's not possible to make a determination in the general case because of two things:

1) nrdo allows specifying arbitrary SQL that can be included in the WHERE clause, that it doesn't attempt to interpret. That condition could be sufficient to make what would otherwise be a multi get into a single get.

2) The get might be a single get due to a business rule that isn't enforced by the database, but by other logic. An example would be a table that includes a start and end date and has a business rule that ensures they never overlap. Then a get like:

get single {params {Date today notnull []};
            where [:today between start_date and end_date]};

is guaranteed to be a single get, but there's no way nrdo could determine that without being told.

Btw, since you are apparently familiar with MySQL, would you be interested in helping add support for MySQL, so that I have support for a Free DB and can upload the code?

Stuart.

YAPDB class, posted 6 Jan 2003 at 20:49 UTC by jmg » (Master)

Hmmm, reminds me of my Python DB class (or set of classes) I wrote. I was tired of not being able to use Python's spiffy emulation of Dicts to make it really simple to access dictionaries. So, you want the entry where customer_id = 5 from the table customers, db business:

db = dbwrap.base()[(pgdb, ":business")]
print db['customers'][('customer_id', 5)]

This will return a dict from the query with the keys being the column names. The code is very ugly since I'm neither a db person, nor have I spent time to make it clean to release.

Why emulate SQL when Python already has a method for handling the hierarchical nature of SQL? Sure this probably wouldn't do much good for handling JOINs, etc, but for most small DB apps, it's more than enough.

YAPdb - where?, posted 6 Jan 2003 at 21:18 UTC by lkcl » (Master)

jmg, that sounds exactly like the sort of thing i want to do: have you the code for download somewhere?

stuart, my curiosity might be my downfall :)

constructing sql queries, posted 7 Jan 2003 at 16:25 UTC by lkcl » (Master)

hiya stuart, hiya jmg,

okay: little more thought on the issue of queries.

in some ways, it doesn't exactly matter about single or multi queries. it can always be made the responsibility of the programmer to use "getlistofitems()" or "getdictsingleitem()" or equivalent. [the python list and dict objects can now be over-ridden in python2.2 for exactly this purpose, to derive classes].

second, if you split (parse or ask the programmer to do it) the SQL statement down into select, update, sort, group by etc. rules. it then becomes a simple enough matter to reconstruct the query.

third, it is possible to emulate views: jmg, as you have done in your code you can extend that system by creating "virtual" tables - db['virtual_table_name'][('customers',1)].

by keeping the rules "split" it is easy to add things - to the WHERE clause in the example above it is easy to add "AND customers.id = 1" onto the rules already stored in the db class under the name 'virtual_table_name'.

it would help if MySQL had views, or it would help to use a database system that had views. your technique, jmg, would then be all that was needed, because for any more complex queries a VIEW could be created, and the "reconstruction" above would be handled automatically.

there is another way to handle this, in MySQL: using temporary (memory) tables. it's possible to create a table from a query, where the created table is stored in memory. further manipulation using simpler rules could then be done on that table.

so there are many options / ways to skin the cat.

stuart, i am a little confused by savannah's decision. in order for people to help you convert the project to open source, it must be made available. so it uses proprietary libraries: so what?? that's fantastic: it will encourage open source developers to call the proprietary vendors and ask for free versions of their code!!!

btw i'm not at all confused, i was being diplomatic. there's got to be a better reason than that.

savannah, posted 7 Jan 2003 at 20:09 UTC by sab39 » (Master)

Savannah's views are very transparent: this is, after all, a branch of the FSF. You can't expect them to be anything other than extremist with regard to free software: it's their whole raison d'etre. They provide an extremely valuable service to the community and I have no problem accepting it on the terms that it's offered - I have no intention of complaining about the (very few) strings attached. It's not like I don't already have savannah pointing to an external page where you can download the existing tarball, proprietary dependencies be damned.

Besides, I don't think the reason that Sun, Microsoft and Oracle haven't open sourced their respective products is due to lack of users asking them to...

As far as single vs multi queries, in nrdo it is the responsibility of the programmer, because it's expected to be the programmer creating the dfn file. The programmer, thus, defines that the "get by id" method is a single get, but the "get by parent id" is a multi get. nrdo uses that information to create unique indexes where appropriate, but even when it's not appropriate to create a unique index, the programmer is still expected to specify the right thing. Among other things, it determines whether the return type of the generated function is a single instance of the class, or a list of instances.

YAPdb is actually dbwrap, posted 8 Jan 2003 at 06:15 UTC by jmg » (Master)

lkcl:
I've posted the code to dbwrap.py. A warning though, the code is very hackish. It's had some design changes mid way through, and has only been tested with Postgress's pgdb module. It does nothing special, so any DB-API 2.0 module that has working cursors should work fine. It should even be somewhat thread safe due to the use of cursors but I've never tested it, I only tried to write it with that in mind.

There is some documentation included, but it's not great. No warantees for the code, and of course, please email me any changes/improvements back to me.

Thanks.

interesting somewhat related article, posted 8 Jan 2003 at 08:33 UTC by nixnut » (Journeyer)

evolutionary databases
anybody read this?

Personally I tend to stick to xml or just pickling my objects till I'm pretty much sure my classes won't change anymore and then make the move to persisting objects to a database.
A good framework for persisting objects to relational databases would be nice. Haven't taken a good look yet at that part of webware yet. Anybody got any experience with it?

Other options..., posted 3 Feb 2003 at 05:31 UTC by ianb » (Journeyer)

Well, this discussion is a little old, but I thought I'd point out the Python HigherLevelDatabaseProgramming Wiki page. There are a number of modules along these lines (one of which I wrote, SQLObject).

The name I've heard most often for these is object-relational mapper, which is to imply that the library creates language-native objects out of database rows. Some of the options are lighter than that, essentially just generating common SQL queries in a more convenient manner (e.g., SQLDict).

I think anyone who does database programming for long without using a library will end up creating an abstraction layer of their own. That's how SQLObject came into existence, and I think that's the case of most of the other (free software) options. You have to really like to watch yourself type to not create at least a little library.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page