Older blog entries for dsnopek (starting at number 12)

Okay folks, this is the way not to do things. Yesterday at work, I managed to erase all of the status information in our database. The status information is how we do all of our accounting and the only way we know where any given item is. Fortunately, we have been able to restore about 98% of the information from the last 6 monthes. All information before that is lost forever.

Here is what happened: I just added the complex where statements to Xmldoom. I tested them, but not very well. There was a bug that essentially stopped complex statements from being passed to the above statement. A simple fix. But while I was looking at the code I made another change that did the opposite of the first bug. Complex wheres worked, but simple where's weren't passed up properly. When looking at the code it seemed like an obvious optimization (one less loop). But anyway, it cause the property Set* methods to ommit the key! So what should have been:

UPDATE table SET data = 'yup' WHERE id = 0

Became:

UPDATE table SET data = 'yup'

Thus setting the value of the data column to 'yup' for EVERY SINGLE ROW. The really sad part is, that if I had simply ran the unit tests before running my script, I would have noticed them fail. I have included as many checks in my code as possible to avoid this EXACT mistake. But alas, my ability to screw up vastly exceeds my ability to double check my work.

I don't think they'll fire me today.

Xmldoom

Just an update on the list I posted on Sep 9th. I completed complex where statements (CVS update comming this afternoon). I haven't had a chance to test in real operation but the SQL is generated properly. I got the parser working for optional, list, and named arguments. But the compiler currently ignores these. This is going to require massive changes to the compiled format. The XML format and the parser now allow multi-key objects, but the compiler will just crash when given them. Nothing else has been touched.

I am still trudging on!

10 Sep 2003 (updated 10 Sep 2003 at 22:52 UTC) »

I started work on the Xmldoom argument stuff as laid out in my last diary entry.  I started workng and things were going well but then I chased this red herring for much too long.  The libxml2 Relax-NG validator said:

Extra element string in interleave
RNG validity error: file temp line 7 element string

It led me to believe that I had setup the <interleave/> in my schema wrong.  After all, I hadn't quite figured out if I had it right.  I want the ability to mix-and-match any number of column types in any order which is something I had lots of trouble with in W3C Schema.  And a quick look showed something like this:

<int name="id"/>
<string name="name"/>
<int name="quantity"/>

So obviously I thought the schema wanted <int/>, <int/>, <string/> and was wondering what the hell this <string/> smack in the middle of the <int/>'s was.  But alas, a quick run through Sun's MSV and I got a simple:

Element "string" is missing attribute "size".

After seeing this, it all flashed into my head: libxml2 is reported layered errors in reverse order.  It has some sort of element order handler that does the <interleave/> and <choice/> tags.  This handler makes sure order is good and then passes control to an element handler.  This handler sees the missing attribute and says, "validity error: ... element string" and returns to the interleave which exclaims the "Extra element string in interleave."  Wow, do these error messages suck.  Sun's MSV is such a beautiful peice of code.  Unfortunately it is written in Java and licensed under the Apache license.  I could provide a hook to allow external validation but this a secondary solution.  I know the last thing I need right now is to get side tracked but the best solution would be to fix libxml2.  Its LGPL and written in C with a powerful Python wrapper.  Everything a man could ever want!

10 Sep 2003 (updated 10 Sep 2003 at 03:37 UTC) »

Alright, I am compiling a list of all the things I need to fix in Xmldoom before the next release.  Most of this is in previous diary entries but I need it all in one place:

  • Method interface stuff:
    • Argument lists. For use inside complex where clauses. Ex. A list of status's to search for.
    • Optional arguments. When data isn't given it isn't added to the WHERE clause.
    • Named arguments. For keyword arguments in the languages that support them.
    • Complex where statements. Allow nested AND and OR, with various <constraint .../>, <argument .../>, and <argument-list .../> tags inside.
  • Connecting table work:
    • Multi-table SELECT in SelectQuery.
    • Rework the KeyTree, Join code.  Make table <join .../>'s more low level and seperate the types into individual clases.
    • Multi-key objects.
    • Transaction objects.
  • Fix Autologon in the code generator.
  • Work on improving the compiled format (Optional).

If I actually work, this is about a month or two of work.  If I work to the full of my ability, its a solid month.  Unfortunately, this list is really daunting and all of my projects dependant on Xmldoom need this stuff.  So it wouldn't be unlike me to start working on a totally unrelated project to avoid it.  Let's hope for the best.

Okay, the Xmldoom definition format needs to change drastically.  There are a couple of new rules that need to go into effect regarding the object structure inorder to allow for more complex operations.

  • No "master" objects. Previously, every database would have a "master" object that wasn't attached to an object table which would be able to add parentless objects. These will be replaced with the ability to make "object" objects without any table. You can have an abitrary number of these. "Why would you ever need multiple "master" objects?", you ask. Well, eventually you'll be able to add and extend the "object types" in both the PyRE and the generated code. You can break-up Xmldoom functions into logical objects and then add non-Xmldoom functionality to them that address the same logical seperation.
  • An object key will refer to a whole table and all its primary keys. This means we will have object with more than one key value.
  • Transaction objects. This is a totally new idea for Xmldoom. Its an object that inherits properties (plus gets and adds) from a table object and additional properties (not sure about gets and adds) defined on a connecting table. For example, consider the tables: items, orders and items_ordered. "items" and "orders" are obvious, but "items_ordered" is a connecting table that attaches items to a particular order along with a quantity and sale price. We want an Order object with an AddMethod like: "Order.AddItem(item_id, quantity, price)" and a GetMethod that returns an Item object augmented with the properties "Quantity" and "Price".
  • Table-less objects can't add any object that has a foriegn key reference or parent.
  • Table objects can't add any object that doesn't have foriegn key reference to (or parent of) its primary key.


Overall the current xml format has outlived the original design goals.  It was never meant to do all the shit I am trying to make it do.  But I just don't have the energy or the insight to rewrite it now.  What I really want is an "object only" version.  It is getting really annoying dealing with table definitions.  I have gotten alot of good xml design ideas from Relax-NG which absolutely soars in abstraction.  But a ground up change is something I would like avoid right now.  Once I start working on the compiled xml format (which knows nothing about tables at all), I'll start porting the object specfic features back into the definition.  Hopefully, that will give me a real understanding of the abstraction required for an "object only" format.

I give this a big "Argh!"

I decided it was finally time to add multi-layer joins to Xmldoom because I needed to create MANY-TO-MANY relationships.  For example:

<table name="orders">
    <columns>
        <int name="order_id" unsigned="true" auto="true" primary-key="true"/>
        <datetime current="true"/>
    </columns>
</table>
<table name="items_ordered">
    <columns>
        <int name="order_id" unsigned="true"/>
        <int name="item_id" unsigned="true"/>
    </columns>

    <!-- TODO: could use a more succinct syntax? -->
    <join column="order_id" link="orders.order_id" relationship="parent"/>
    <join column="item_id" link="items.item_id" relationship="parent"/>
</table>
<table name="items">
    <columns>
        <int name="item_id" unsigned="true"/>
        <string name="description" length="80"/>
    </columns>
</table>

Anyway, we would obviously want to be able to retrieve the items on the order.  This is where the multi-layer join comes in.  We ask definition, "How do I get from orders to items?"  The answer is:  

orders.order_id -> items_ordered.order_id
items_ordered.item_id -> items.item_id

In the end, we'd have a Order.GetItems() method that would return the items on the order, right?  Well, not exactly.  What if items_ordered contained more data about each item ordered?  For example, the quantity or the sale price.  Not only common but almost essential.  So I could almost see the need for a new object: ItemOrdered with properties for Quantity and SalePrice.  But this isn't exactly what we want because an object must have a single object key.

Here are a number of options:

  • Allow an object to be identified by two keys. This also opens the possibility for simply declaring the keys in the table and <object-key ...> will only refer to a table. This would also remove the need for <unique/>.
  • Performing the two-layer join but also merging the Item object with the ItemOrdered object so that we can get all of its properties without another query. Or maybe return a dict or tuple containing the packed Item object and the extra ItemOrdered data? Maybe the ability to have ItemOrdered be a descendant of the Item object?


The answer will probably be a combination of the above ideas.  I am just upset that I never thought of this situation before now.

I recently decided it was time to start working on an XML Schema for Xmldoom. This led me down a path I never knew existed. First, I started working with the XML Schema format from W3C because I thought it was the "standard". The format was difficult to understand but I did manage to learn how to use it. Then I started looking for a validator so that I could test my schemas. I tried xmllint (from libxml2) because I was always very fond of libxml2 from my SAGElib days. Anyway, it coughed on some things I thought should *definitely* had worked. So I searched there mailing list for infos and found that W3C support was incomplete but Relax-ng support was complete. At first I thought, "Oh man, I have to find a new validator." Then I read another post on the libxml2 mailing list, bashing the W3C format altogether and supporting Relax-NG. After battling all day with the W3C format, the arguments rang clear.

And now, I am a relax-ng man! Expect Xmldoom schemas very soon.

OK- This is just proof that I write many too many custom projects to accomplish something simple. In Pyml, we need a simple mechanism for adding entries to the sys.path. Currently, I use this snippet:

<?py import os, sys; sys.path.insert(0, os.path.join(PymlPath, "..")) ?>

While this works, it is clunky in size and is not totally correct. What we really need is something that easily addes the path to sys (possibly without import'ing os?) and then removes it after execution (can we have seperate sys.path's for each script?). A possible syntax includes:

Does os.path.join(PymlPath, '..')
<?path '..' ?>
Dose os.path.join(PymlPyth, '..', 'lib')
<?path '..', 'lib' ?>

My temporary solution is to add another .htaccess to each "top-level" directory.

2 Sep 2003 (updated 2 Sep 2003 at 14:34 UTC) »

In working progressively on an Xmldoom database that cannot be destroyed inbetween extensions, there are a couple of features that would be useful for SQL script generation:

1. Add a --partial option that will only generate the definition for the table given on the command line. (This I need now!) That way you can add tables to an existing database.

2. Add an --alter option that would cause the SQL script to use ALTER syntax (instead of CREATE) for each table specified. (This is a "blue sky" feature, ie. I don't need it immediately). This way we can specify all the new and changed tables with --alter and --partial. The only weirdness I see is if you want to "alter" only a single table: "xmldoom --alter table1 --partial table1".

In the future, I could see setting up a build system where each table is held in a seperate file (either: the main .xml file is generated at build time OR with a cool new tag like <include filename="..."/>). And when the table definition is changed, a sql script is created to alter database which is then automatically updated. We could even do it with a single .xml file with SCons and a new node type!

Just singing about the sky ...

28 Aug 2003 (updated 28 Aug 2003 at 16:05 UTC) »

In my never ending attempt refactor Xmldoom, it is turning into a monster. Here is the layout I envision for the future.

Move Definition.py and Parser.py into a new Definition/ module. These two are so tied together anyway. Then create a Compile/ (or "Compiler"?) module that contains the contents of SQL.py split into SQL.py and Compile.py. We can then include a parser/generator for a now-imaginary XML format for the compiled code. I am also considering eliminating Config.py and putting the SQL config stuff in the new Compile/ module. This would open up the possibily for moving the backend specific config into loadable modules like in Output/. The only problem is where to move the API config stuff since it is needed by both Output/ and PyRE.py.

This would make the following structure:

Xmldoom/
Xmldoom/Definition/ # dealing with the XML database definition
Xmldoom/Definition/Parser.py # parsing the XML
Xmldoom/Definition/Definition.py # the abstract in-memory version
Xmldoom/Definition/__init__.py # connects the parser and definition
Xmldoom/Compiler/SQL.py # the SQL query building code
Xmldoom/Compiler/Compiler.py # turns a definition into our compiled format
Xmldoom/Compiler/Parser.py # XML parser for the XML compiled format
Xmldoom/Compiler/__init__.py # connects all the parts into simple functions
Xmldoom/Output/ # does code generation
Xmldoom/PyRE.py # runtime engine (maybe make into directory?)

This would break the the whole Xmldoom process into distinct packages in the source tree:

Definition --> Compiler --> PyRE

- or -

Definition --> Compiler --> Output

Each part would, of course, have its own internal process and interaction of its modules that may be non-linear. But the new design would have the advantage of organizing all the major parts into there distinctly linear relationship.

This code base slowly turning from a convoluted mess into a coding masterpeice!

3 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!