Older blog entries for dsnopek (starting at number 7)

I give this a big "Argh!"

I decided it was finally time to add multi-layer joins to Xmldoom because I needed to create MANY-TO-MANY relationships.  For example:

<table name="orders">
    <columns>
        <int name="order_id" unsigned="true" auto="true" primary-key="true"/>
        <datetime current="true"/>
    </columns>
</table>
<table name="items_ordered">
    <columns>
        <int name="order_id" unsigned="true"/>
        <int name="item_id" unsigned="true"/>
    </columns>

    <!-- TODO: could use a more succinct syntax? -->
    <join column="order_id" link="orders.order_id" relationship="parent"/>
    <join column="item_id" link="items.item_id" relationship="parent"/>
</table>
<table name="items">
    <columns>
        <int name="item_id" unsigned="true"/>
        <string name="description" length="80"/>
    </columns>
</table>

Anyway, we would obviously want to be able to retrieve the items on the order.  This is where the multi-layer join comes in.  We ask definition, "How do I get from orders to items?"  The answer is:  

orders.order_id -> items_ordered.order_id
items_ordered.item_id -> items.item_id

In the end, we'd have a Order.GetItems() method that would return the items on the order, right?  Well, not exactly.  What if items_ordered contained more data about each item ordered?  For example, the quantity or the sale price.  Not only common but almost essential.  So I could almost see the need for a new object: ItemOrdered with properties for Quantity and SalePrice.  But this isn't exactly what we want because an object must have a single object key.

Here are a number of options:

  • Allow an object to be identified by two keys. This also opens the possibility for simply declaring the keys in the table and <object-key ...> will only refer to a table. This would also remove the need for <unique/>.
  • Performing the two-layer join but also merging the Item object with the ItemOrdered object so that we can get all of its properties without another query. Or maybe return a dict or tuple containing the packed Item object and the extra ItemOrdered data? Maybe the ability to have ItemOrdered be a descendant of the Item object?


The answer will probably be a combination of the above ideas.  I am just upset that I never thought of this situation before now.

I recently decided it was time to start working on an XML Schema for Xmldoom. This led me down a path I never knew existed. First, I started working with the XML Schema format from W3C because I thought it was the "standard". The format was difficult to understand but I did manage to learn how to use it. Then I started looking for a validator so that I could test my schemas. I tried xmllint (from libxml2) because I was always very fond of libxml2 from my SAGElib days. Anyway, it coughed on some things I thought should *definitely* had worked. So I searched there mailing list for infos and found that W3C support was incomplete but Relax-ng support was complete. At first I thought, "Oh man, I have to find a new validator." Then I read another post on the libxml2 mailing list, bashing the W3C format altogether and supporting Relax-NG. After battling all day with the W3C format, the arguments rang clear.

And now, I am a relax-ng man! Expect Xmldoom schemas very soon.

OK- This is just proof that I write many too many custom projects to accomplish something simple. In Pyml, we need a simple mechanism for adding entries to the sys.path. Currently, I use this snippet:

<?py import os, sys; sys.path.insert(0, os.path.join(PymlPath, "..")) ?>

While this works, it is clunky in size and is not totally correct. What we really need is something that easily addes the path to sys (possibly without import'ing os?) and then removes it after execution (can we have seperate sys.path's for each script?). A possible syntax includes:

Does os.path.join(PymlPath, '..')
<?path '..' ?>
Dose os.path.join(PymlPyth, '..', 'lib')
<?path '..', 'lib' ?>

My temporary solution is to add another .htaccess to each "top-level" directory.

2 Sep 2003 (updated 2 Sep 2003 at 14:34 UTC) »

In working progressively on an Xmldoom database that cannot be destroyed inbetween extensions, there are a couple of features that would be useful for SQL script generation:

1. Add a --partial option that will only generate the definition for the table given on the command line. (This I need now!) That way you can add tables to an existing database.

2. Add an --alter option that would cause the SQL script to use ALTER syntax (instead of CREATE) for each table specified. (This is a "blue sky" feature, ie. I don't need it immediately). This way we can specify all the new and changed tables with --alter and --partial. The only weirdness I see is if you want to "alter" only a single table: "xmldoom --alter table1 --partial table1".

In the future, I could see setting up a build system where each table is held in a seperate file (either: the main .xml file is generated at build time OR with a cool new tag like <include filename="..."/>). And when the table definition is changed, a sql script is created to alter database which is then automatically updated. We could even do it with a single .xml file with SCons and a new node type!

Just singing about the sky ...

28 Aug 2003 (updated 28 Aug 2003 at 16:05 UTC) »

In my never ending attempt refactor Xmldoom, it is turning into a monster. Here is the layout I envision for the future.

Move Definition.py and Parser.py into a new Definition/ module. These two are so tied together anyway. Then create a Compile/ (or "Compiler"?) module that contains the contents of SQL.py split into SQL.py and Compile.py. We can then include a parser/generator for a now-imaginary XML format for the compiled code. I am also considering eliminating Config.py and putting the SQL config stuff in the new Compile/ module. This would open up the possibily for moving the backend specific config into loadable modules like in Output/. The only problem is where to move the API config stuff since it is needed by both Output/ and PyRE.py.

This would make the following structure:

Xmldoom/
Xmldoom/Definition/ # dealing with the XML database definition
Xmldoom/Definition/Parser.py # parsing the XML
Xmldoom/Definition/Definition.py # the abstract in-memory version
Xmldoom/Definition/__init__.py # connects the parser and definition
Xmldoom/Compiler/SQL.py # the SQL query building code
Xmldoom/Compiler/Compiler.py # turns a definition into our compiled format
Xmldoom/Compiler/Parser.py # XML parser for the XML compiled format
Xmldoom/Compiler/__init__.py # connects all the parts into simple functions
Xmldoom/Output/ # does code generation
Xmldoom/PyRE.py # runtime engine (maybe make into directory?)

This would break the the whole Xmldoom process into distinct packages in the source tree:

Definition --> Compiler --> PyRE

- or -

Definition --> Compiler --> Output

Each part would, of course, have its own internal process and interaction of its modules that may be non-linear. But the new design would have the advantage of organizing all the major parts into there distinctly linear relationship.

This code base slowly turning from a convoluted mess into a coding masterpeice!

    This diary entry exists only to test my new Advogato client.  Its written in
XML-RPC & Python that allows me to write plain text, which will be converted
into equivantent HTML.  My diary needs are simple, I require no formatting
(other than lists, tabs and white space), I want to use Vim to compose my
entries, and I must be able to enter XML/HTML tags as literals.

Here is some test XML, that I hope will get formatted properly:
   <document>
     <paragraph> My, What a wonderful paragraph you have there? </paragraph>
     <paragraph> I thought so too! !! !     ;-)    ! !! </paragraph>
   <document>

Hey look!  A list:

  • test
</ul></ul>

26 Aug 2003 (updated 26 Aug 2003 at 21:45 UTC) »

I was trying to test my new script by posting a diary entry but instead posted the script itself. Eliminated to hide my password.

I have been working on the RMA database at work, and have discovered the following features need to be add to Xmldoom:

1. Enumeration support. This means that the existing <column ...> col_name </column> tags need to be changed to <column name="col_name" .../> so that we can support syntax like:

<enumeration name="status"> <item value="open"/> <item value="closed"/> </enumeration>

2. Argument lists. Currently arguments only accept single values. You should be able to pass a list of arguments for a where clause that get AND'd or OR'd together. I see a new tag <argument-list mode="..." />.

3. Optional arguments. So that we don't have to make all sorts of methods like: "FindItemBySerial" and "FindItemBySerialAndStatus". We could just define "FindItemBySerial" with an optional status argument. This would seriously improve usability.

4. Named arguments. Add an optional 'name="..."' to all argument types (single and list). This way we can support "keyword arguments" for all languages that support them, namely Python. Arguments add by default (ie. not with an <argument ...> tag) should either use the column name or provide a superflous <argument ...> tag inorder to name it.

5. More complex <where> usage. Currently we can only set-up where clauses that are all AND'd together. I want to support nested AND and OR clauses. Proposed usage:

<where mode="AND"> <constraint column="status" comparison="IS" value="Closed"/> <constraint column="status" comparison="IS" value="Denied"/> </where>

This is just what I thought of today. I guess I have my work cut out for me!

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!