Older blog entries for simonstl (starting at number 49)

I talked with various database-oriented folks at a woodworking picnic this weekend and I came back to discussion of XQuery at work, so I've been thinking a bit harder about what I learned from relational databases and how I apply it to XML.

I've spent most of my time in relational databases using smaller tools that had a grasp of relations and SQL but didn't spend enormous effort cramming more features into their SQL support. I started in Microsoft Access (I know, I know) and I've used MySQL and I've been very happy with their abilities to store and retrieve information. It's been my privilege never to have to work on "Enterprise" relational databases - I've documented an Oracle setup and tinkered with a copy of DB2 5.0 that IBM sent to my doorstep one day (dunno why), but that's it.

When I look at what I do when I'm writing programs to work with XML, I'm happiest when I can work in roughly the style I've used with relational databases - get a chunk of information with a basic query, then process it in my own (usually Java, but sometimes XSLT or other) environment. That way I can apply my existing skills without having to learn yet-another-goddamn-programming language.

This seems to be the opposite approach of what passes for the conventional wisdom these days. Stored procedures have been common for years, and SQL has grown far beyond the subset I consider sane. (Heck, there's even a book titled "SQL-99 Complete, Really".) XQuery is a full blown language, as capable as XSLT but more procedural-looking, and complete with a type system drawn from XML Schema.

Looking at all of this, I guess I can see where it's useful to some people in some situations, but to me it's mostly just more junk to look at once and ignore. As fond as I am of XML, the notion that markup is an excuse to create not just one (XSLT) but two (XQuery) Turing-complete languages seems bizarre at best. E4X, a set of XML extensions to JavaScript, is at least a relatively minor modification, but still feels strange in the context of all of these things which have less and less to do with markup and more and more to do with programming.

I suspect that over time I'll be retreating to my own home-grown toolkit, adding XPath 1.0 to it but no more, and letting the behemoths create whatever they like for whoever is supposedly buying it. We can still exchange documents; there's no need to exchange superstructure.

I've always done most of my programming by myself, whether it was for work or my various open source projects. When I first heard about Extreme Programming and things like pair programming, I pretty much wrote it off as stuff that might be nice for people social enough to program in groups with a greater tolerance for "enterprise" work in general.

Okay, some aspects, like iterative development, seemed pretty cool. Code standards make sense to me, and I try hard to do things like comment my code and use meaningful variable names - in large part because I may have to reuse or modify it months or years later. Stuff like communication and shared ownership of code, though... do I really want to set up multiple personalities talking to myself and arguing about code direction?

The one piece of the XP puzzle that's always been on my "I wish I had energy to do that" list is testing. I've done plenty of testing on all kinds of code in all kinds of circumstances, but it's never been the circumstances I wanted. My testing style hadn't improved much since my ZX81 and AppleSoft days. The kind of code I tend to write is all about XML transformations, with various randomly nested and intermixed structures - not exactly easy unit testing fodder.

Today I finally changed that. I've been building a partial XML parser, and have a whole set of context-tracking structures that I need to have working reliably for the rest to make sense. I knew it worked fine on the particular cases I'd tested, but they were based on complete documents, and didn't necessarily set off every aspect. Writing unit tests (in JUnit) that explore specific aspects of these processes has been pretty easy and extremely useful.

I've already found a few bugs - largely in the way I was testing code before - and feel a lot brighter about the foundations of the code I'm using. In about two hours of test writing, I've reduced the number of places I might search for errors by about a third, and reduced my paranoia about making changes far more drastically. I still have some snarled code ahead of me, but I finally feel like I have the right tools for unsnarling it without create even larger tangles.

I suspect unit testing is probably the piece of XP with the most potential benefit for solo programmers, though CVS has saved me from myself a few times and let me open a wider door to other people's work. Unit testing also combines immediate benefits for me with the prospect of an easier time for other developers who might someday want to build on this work, even if I never hear about it. Seems like a good thing all around.

20 Mar 2003 (updated 20 Mar 2003 at 13:57 UTC) »
dyork - I've got the Taig lathe as well, and really like it. It seems to cry out for tinkering, with a really basic foundation and all those attachments. Sadly, most of the attachments are for turning metal, and I have little clue how to do that, but making wooden pens is great for now. A four-hour drive to the nearest Lee Valley store is unfortunately too much for taking classes!

I've been busy on xml-dev, announcing a half-parser (yes, it's a parser that preserves the full text of the original XML document) and talking about Microsoft Office beta XML formats, with sample XML documents. (General, Access, Excel, Word).

I'll also be presenting on this stuff at the Open Source Conference in July. The last thing I presented there was Open Source, Open Data: What XML has to offer Open Source. Should be an interesting followup - I was thinking about .NET and XML then, but Office is both different and potentially more interesting in a lot of ways.

As for the rest of the world - ugh.

dyork - what kind of micro-lathe are you using to turn pens? I've probably made a dozen, and should get back into it.

The concrete nature of woodworking and the much greater sense of independence I get doing it makes it tempting for all the reasons that make me doubt tech. You can learn from others and teach others in woodworking, but there's not nearly the same sense that your destiny is welded to other people's business decisions.

I'm still programming, still writing, still editing. Just not sure it's what I want to do for my next forty years. (I'm 32.)

Quiet's kind of nice

I've been enjoying myself lately by not participating in a number of things that I should probably care about.

I used to complain here about the madness of URIs, but this fine formulation has let me stop worrying about it. The W3C's Technical Architecture Group (TAG) isn't likely to accept that approach in my lifetime, but that's fine. I've stopped expecting my reality and their reality to have much in common, so I can safely ignore that august body's busy mailing list. I don't need Platonic Forms in my life, thank you very much.

I'm still active, though a little less so, on xml-dev, but even that list feels like it's mellowed a bit. More interestingly, though currently quiet, I started up the xml-hypertext mailing list. It's fairly peaceful there so far, but hopefully it'll grow with time.

Meanwhile, maybe I can get some work done.

Is civility harmful?

Mark Baker has an interesting response to a piece by Elliotte Rusty Harold on Web Services. Mark, despite calling himself a "Tech Curmudgeon", takes issue with Elliotte's use of the word "idiots", and asks "So please folks, try to keep it civil. Comments such as this one only serve to alienate, which is the last thing we need."

I'm not a particularly friendly or polite person, and didn't see Elliotte's comments as anywhere close to out of line. Still, I think I agree with Mark that "Comments such as this one only serve to alienate". Where I part company is that I think alienation is important, and that pretending we like each other more than we really do is likely to produce muddy compromises at best.

In the case of Web Services and XML, it's become clearer and clearer over time that these technologies are barely related and frequently in conflict. Web Services happens to use XML, but they use it quite badly, from the perspective of many XML people. As Elliotte says "Web Services violate the fundamental design of XML". Not only that, but the ambitions of Web Services have been a driving force behind some rather toxic specifications, notably W3C XML Schema. I don't mind saying that Web Services is poisoning XML, turning what was once a simplification into a major new set of complications.

Does that alienate people? Yes! It should. I'd love to get the Web Services folks to rethink their foundations. Failing that, making clear that there are serious points of friction seems like the best course of action - and being civil has little to contribute to that. Forking is not a risk here - it's an opportunity.

Of course, I also found this rant well-worth reading, though it's quite completely over the top. Expecting progress to come in neatly-wrapped boxes with thank-you notes attached seems like a lot too much to ask - and counter-productive, to boot.

Every now and then, something new and interesting surfaces in the world of URIs. The notion of a "probabilistic web" is both different and well-worth considering. Maybe there is something to all those lines about light appearing in the darkest hour.

Eventually it becomes clear that any effort to discuss URLs or URIs is pointless.

Uniform Resource Identifiers are the strangest religion I've encountered on the Web. People use them in all kinds of largely incompatible ways, but somehow we're supposed to believe that since the URL part works and survives things like cache issues, these magical abstractions will solve all our identification problems.

Meanwhile, no one can give me a straight answer on how to identify a representation with a URI reference. The short answer, of course, is that you can't, though the fragment identifier part is amusingly representation-dependent and it seems like a representation must in some sense be a resource itself... but you'd better stay away from that hack of a DOS-like file extension or fall into sin.

The Web as a huge set of Platonic Forms would be hysterically funny if it weren't so throughly sad.

I came to XML for hypertext, and I'm still trying to get there. XLink/XPointer is just not that helpful, but its supporters seem pretty well convinced. Tim Bray, for instance, said:

I reviewed the XLink spec, and I thought about how I'd go about designing markup for multi-ended and out-of-band links, and I thought XLink presented a pretty compelling design for how you'd do those things.

I think disagreement should be accompanied by examples: "here's a better way to do a multi-ended/out-of-band/metadata-loaded hyperlink, and here's why it's better."

Here's phase 1 of that discussion - an initial proposal. The next phases include a more comprehensive example, a formal processing model, and an implementation.

And if that proposal seems like a lot of work to you for linking, don't worry! Micah Dubinko's been working on a much simpler set of linking constructs for in-line linking called SkunkLink. Given the chance, I think SkunkLink will take care of most common linking issues and let those of us interested in stranger stuff focus on difficult questions more cleanly.

40 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!