Older blog entries for agntdrake (starting at number 2)

I'm a bad Canadian.

After 20 years of not ice skating I somehow got the idea the other day that what I really wanted to do was go skating. My hockey career was cut short at the tender age of six, when my parents decided to move from Ontario (the province, not the town in California) to Saudi Arabia. I haven't laced up the skates since; that is, until this past weekend. Amazingly enough, I wasn't all that bad at it (that's not to say I was good, but at least I didn't bail). They have these crummy little plastic skates which you could rent, which seemed to do the trick, but it was a little intimidating when people would jet by you at Mach 10 wearing a $350 pair of ice skates and you're moving a slightly faster than walking speed with a piece of plastic digging into your foot.

Actually, what was more scary than that was the onslaught of 6 year old skaters tripping and falling in front of you every couple of minutes which you repeatedly have to dodge. I guess it makes you a better skater in the long run though.

More on XML:

So what exactly is all this key->key->value stuff that I've been talking about anyway? The best way of describing it would be a means of traversing a tree. The XML document itself is really just a big tree of nodes which holds data about particular things.

<foo>
  <key>value</key>
  <foo2>
    <key>value</key>
  </foo2>
</foo>
Could be represented logically as:
                foo
                / \
              key foo2
              /     \
           value    key
                      \
                      value
We can build something like this somewhat easily using the XML::Parser module and using Style => 'Tree' for the output. Here's a snippet of the code using Data::Dumper to print out the resulting data structure:

use XML::Parser;
use Data::Dumper;

my $xml = <<END; <foo> <key>value</key> <foo2> <key>value</key> </foo2> </foo> END

$x = new XML::Parser( Style => 'Tree' ); $y = $x->parse( $xml );

print Dumper( $y );

Which should spit out something with lots of references roughly in the shape of the tree that we're trying to build. The problem with this is that it's still a bit of a ways off, and you'll need to still heavily parse this to get in into any kind of reasonable shape. A module like XML::Simple will do this for you (I believe someone mentioned it on here), although I have to admit, I'm not crazy about the syntax.

When I initially attacked the problem, I didn't bother to use the 'Tree' style output with XML::Parser and was instead just setting up the different Start, End and Char handlers and then using a couple of stacks to build the tree. I've been struggling with this over the last few days and when I think I've just about got everything working, it seems like my logic is a bit off and the resulting data structure doesn't quite work. Oh well, c'est la vie. The struggles of a computer programmer.

I'll admit it; I never really got the point of why anyone would use XML until very recently. Every experience I've ever had involving someone wanting to use XML generally involved some crazed XML zealot going on about how we really needed to use some twisted, bizzare functionality of XSL and somehow this was supposed to be good design practice. I always regarded the people who love XML as being just the same as people who love PHP.

Don't get me wrong here. You can use PHP to write some very wonderful things. Infact, I used to use PHP quite a bit when php v2 was the in vogue web technology before I was indoctrinated into the world of Perl. What is wrong with PHP you ask? Being a programmer by nature, I have a very hard time letting other non-programmers close to any of the code I work on (and even certain other programmers at other times). On a lot of the projects I work on, I'm expected to play nice with everyone else on the team, and that's just not possible if joe-blow-artist whose responsibility is to make a couple jpeg images and cook up some HTML is breaking all of the code I've been working on. Code and content do not mix very well.

One of the truely great things about perl is its data types. I'm not sure why other languanges haven't embraced the concept of an associative array the way perl has with its hash data type (with the exception of possibly python with its tupples). About 95% of the code I work on involves loops and hashes.

So when designing the original Fez, I realized right away that something which would be really useful would be some way of scooping out data from a pile of RPM's and storing that information (along with the dependencies, individual files, versions, etc. etc.) in a complex data structure which could be accessed quickly to pull out relevant information. Essentially the data looks something like:

key -> key -> key -> value

which is really nothing more than a complex hash. It started occuring to me after the umpteenth time I ran into a similiar data structure that there was something to all this stuff. I was using a combination of Sleepycat's very nice DBM database and MLDBM to store all of this data, but with everything stored as binary data, it makes things a real pain in the ass to actually edit anything. This is of course what text files are for.

One of the other things I needed for Fez was a way of creating configuration parameters for the software because, as I was saying before, code and content don't mix. Originally I had made just a simple text file with statements like:

key = "value";

which I would read out as a simple hash, but this is a little limiting in the respect that there are name space problems (any given name should be able to belong to any given number of unique sets). So I recently started writing something which looked something like this:

[foo]
  key = "value";
  [foo2]
    key = "value";
  [/foo2]
[/foo]
and then have a function which populated a complex hash and made it very easy to actually represent a whole lot of data very easily in a text file.

The epiphany hit only a short time ago after a co-worker was asking me whether or not I was using XML with some of the new code I was working on at work. He handed me a couple of books (they were pretty much just Java and XML books, of course) which I perused while trying to figure what the hell he was getting at. The strength of XML doesn't seem to be in the DOM, or XSL or any of the clutter which makes learning about XML so difficult. The strength is in being easily able to represent a complex data structure as:
<foo>
  <key>value</key>
  <foo2>
    <key>value</key>
  </foo2>
</foo>
which I could then represent in perl as a hash of hashes. None of this is really an earth shattering discovery, but what hit me as being strange was that in the midst of all this I ran upon an an article on xml.com which was written only a couple of weeks ago entitled What's wrong with Perl and XML?. There are some 35 different modules on CPAN in the XML directory, some of which do really similiar stuff to this (like XML::Dumper, XML::Config, XML::Grove, XML::Registry), but why hasn't any one module become the defacto way of dealing with XML data easily? I guess sometimes even the simplest of ideas can be some of the most elusive.

After about a year of neglect, I'm getting fairly close to releasing the new version of Fezbox. RedHat has been driving me insane over the past little while, and the new v7.0 is no exception to the rule. The original Fez was of course written to work with v6.0, but since v6.1 was such a disaster (Kickstart was broken entirely), I haven't really done anything with it until taking a job at VA Linux and allowing them to use it for their 'BTOS' (Build to Order Sofware) system.

You can check out the pre-alpha at this address. Right now, there is only a small set of packages from RedHat v6.2, however now that I've broken my machine by installing V7.0 (where did /usr/dict/words go?) I'll probably index up a bunch of packages to work with it.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!