Older blog entries for simonstl (starting at number 55)

18 Jul 2003 (updated 18 Jul 2003 at 02:21 UTC) »

Last week was a blur of two conferences - OSCON and the Sells Brothers Applied XML Conference, both in the Portland, Oregon area. The two conferences were very different for a few fundamental reasons:

  • OSCON is completely dominated by open source cultures and values, despite the Microsoft-paid lunch, while Applied XML was largely a .NET fest, with a host presently employed by Microsoft and lots of Microsoft-oriented content.

  • While OSCON had an largely tutorial XML track (which I thought went very well, though I'm biased as I chaired it), Applied XML was a single-track conference where every session had something to do with, and generally focused on, XML.

Applied XML was smaller and more tightly focused, but it had a similar energy level to OSCON, at least in the hallway conversations. OSCON was amazing as usual for bringing together developers from a variety of different communities, and letting them explore sessions they found interesting. A lot of tracks benefited from crossovers between different kinds of developers.

One especially interesting bit was the crossover between the conferences. There was a BOF session Wednesday night on what it would take to implement dynamic languages (Perl, Python, Ruby, etc.) efficiently in the .NET environment, which was built with statically typed languages (think C#). That conversation's probably just getting started, but there were some fascinating bits. Peter Drayton and Brad Merrill of Microsoft, who hosted the BOF, were at both shows, and seemed happy in both contexts.

(There was also a surprising amount of RDF in conversations at both conferences, continuing a trend I noticed at OSCOM. It wasn't on either conference program, but it was in the bars and hallways.)

Though I'm really tired of traveling, I'm looking forward to one last conference this summer - one that combines the small size and XML focus of the Applied XML conference with the community approach of OSCON. Extreme Markup Languages, running from August 4-8, has seen fit to give me a Daily Polemic. I suppose it's appropriate given my usual style, but it's still a pretty scary responsibility.

To live up to that challenge, I've turned to the wonderful world of Playmobil. Playmobil figures make excellent computer consultants, especially now that Playmobil offers office equipment. I'm hoping to use SMIL for the slides, though maybe it'll be SVG. We'll see.

I'll be taking next week off - I decided it was time to spend some time working on my own projects and remember why I like doing this stuff. (Maybe I won't - we'll see!) In addition to the Playmobil photo shoot, I'm hoping to put some work in on my Java tools for processing XML. I haven't been able to spend more than one day at a time on them lately, with months between sessions.

A week of training

I spent last week in Ottawa, taking five days of training from Ken Holman on XSLT and XSL-FO. I had plenty of job-related reasons to take the training, from growing use of XSLT in my work to re-connecting with users in the field, but I'll admit it was a pleasant luxury to take a week to look at a complex technology up close.

I blogged each day over on my O'Reilly blog (1 2 3 4 5), but it's interesting to look back at the course as a whole. Three days of XSLT and two days of XSL-FO is a lot, but even that's kind of a forced march through the technologies, really only scratching the surface. Ken did a great job of covering the overall story, making us think our way through exercises on key features, and pointing out potential problems. Actually having worked through the details (and coming back with a handout covered in post-it notes) should help me remember what I did.

I haven't taken a formal training course in years, though. I've been mostly self-taught, with the occasional conference tutoral supplementing books, specs, and email. Face-to-face has some huge advantages, I have to say. Immediate interactions and immediate gratification are wonderful, as is having a pre-built set of exercises really intended for this kind of back-and-forth interaction.

I already knew (and heck, disliked) a good deal of XSLT before I went in, but this was an opportunity to check and expand on what I knew. Fortunately, the course was structured so that different users could get what they needed - including those of us who'd already been over a lot of it. (The XSL-FO class included at least one attendee who knew a lot more than I did about any of the material.)

Training isn't always an option for everyone, but after the past week, I'm thinking it's something I'll be encouraging a lot more.

I seem to have a lot of conferences on my schedule at the moment, just when I'm busiest with "real work". In any case, if anyone's interested in XML-related conferences, I wrote up what's on my radar. I may or may not make it to everything here, but suspect there may be a few folks here who will.

Content-rich keynote presentations seem to be pretty rare, though they've improved some since the bubble faded. I was lucky last week to go to a conference (XML Europe) which had two excellent keynotes back to back, both exploring the synergy of open source and XML.

I've written both of them up for xmlhack. Jon Bosak explored how he hopes open standards and open source can work together, while Daniel Veillard examined how XML is used by open source projects. They were very different presentations, kind of like looking through opposite ends of a telescope, but they complemented each other beautifully.

I've been thinking a lot lately about computing cultures. XML culture, for instance, feels very different from Java culture. Though I do most of my programming in Java, the work I do leads me into creating XML-oriented interfaces that are far removed from the suggestions in Effective Java, for instance.

While I program in Java, I don't think I'm part of Java culture - I even find some aspects of it profoundly disturbing. I've concluded over time that Python is probably a more appropriate medium for what I want to do, but I've got all this easily-mined work in Java...

I think similar issues arise in information modeling and storage. I wrote a short piece on it yesterday, "The (data) medium is the message". The bit I quoted from McLuhan, which I think is pretty much at the heart of the matter, is:

"Environments are not passive wrappings but active processes."

Programmers tend to think of ourselves as active and the environments we program as passive, but it's definitely a two-way street, even before you get into the environment-changing possibilities of open source.

I just got back from NYC, where I presented on the new Microsoft Office XML stuff. Much of the presentation was demo, so the slides are hardly a complete picture, but there's a rough outline there. I'd be curious to hear if anyone's interested in this stuff - there's sort of a "free love" opportunity here, if not free beer or free speech.

The conversations after the presentation were also interesting, and seemed to reflect some of the stories about XML that trouble me the most - the notion that we can agree on vocabularies and interop will come automatically with that agreement.

There are a lot of problems with this vision, but perhaps the most dangerous aspect of those problems is that they only tend to emerge as the scale of the work - measured in the number of users and the scope of the vocabulary - increases. In small cases, it's pretty easy to put together some basic stuff quickly and make it work. Configuration files that belong to one programmer and one program are a classic case. Moving from there, files that move within a small circle of people are easy to deal with. Sometimes these experiments bear fruit that works well - HTML, for instance - but they pretty much always encounter difficulties as the scope or audience grow.

The answer in the SGML community, and more generally in the XML and computing communities, has been to form committees of people who know markup issues and the relevant information and hope that their consensus reflects the reality of the information problems and how to solve them.

Committees, especially successful ones, are often reflections of the problems they have to solve. Who's invited? How big is the committee? Which view of a given subject is the right one? Even a single transaction can look very different from different perspectives. What kind of data is involved, and how is it communicated? How much can a single group of people accomplish when their information world is in constant flux?

Some committees accomplish a lot, others accomplish very little. Some stay in touch with a wider audience, and others put up barriers - sometimes to avoid information overload, sometimes to avoid criticism. Versioning is a constant problem for specifications, as the world moves on, and XML's intrinsic promise of 'extensibility' is infuriating for people who want to control extensions.

Schemas don't help this much, except to formalize solutions and give computers a chance of comprehending them. The formalisms provide a vocabulary in which developers and committees can express their intentions, but there's nothing intrinsic about a schema - whether it be a DTD, XML Schema, RELAX NG, Schematron, or even RDF - that pins down meaning in any immutable or unquestionable sense.

Some people seem intent on reaching for the semantic sky, pinning down vocabularies with labels like butterflies. Some of their results are quite beautiful, at least to fellow butterfly collectors, but live butterflies tend to flutter around a lot.

XML isn't going to solve the problems of people who want to pin down the meanings of information stored in computers. It's demonstrated that a consistent syntax for labeling and structuring information is useful for some people and some tasks, but that's about the limit. For some of us, that's too much already. For others of us (myself included), that's enough - going much beyond that seems to cost more than it's worth.

Cook up your own stuff, and get used to consuming what other people offer you, even if it isn't exactly in the form you wanted or expected. People have long been better at this kind of work than computers, but it's time to start accepting chaos rather than trying constantly to control it. That might even mean strengthening the role of humans in information processing again - strange to some, useful to others.

I talked with various database-oriented folks at a woodworking picnic this weekend and I came back to discussion of XQuery at work, so I've been thinking a bit harder about what I learned from relational databases and how I apply it to XML.

I've spent most of my time in relational databases using smaller tools that had a grasp of relations and SQL but didn't spend enormous effort cramming more features into their SQL support. I started in Microsoft Access (I know, I know) and I've used MySQL and I've been very happy with their abilities to store and retrieve information. It's been my privilege never to have to work on "Enterprise" relational databases - I've documented an Oracle setup and tinkered with a copy of DB2 5.0 that IBM sent to my doorstep one day (dunno why), but that's it.

When I look at what I do when I'm writing programs to work with XML, I'm happiest when I can work in roughly the style I've used with relational databases - get a chunk of information with a basic query, then process it in my own (usually Java, but sometimes XSLT or other) environment. That way I can apply my existing skills without having to learn yet-another-goddamn-programming language.

This seems to be the opposite approach of what passes for the conventional wisdom these days. Stored procedures have been common for years, and SQL has grown far beyond the subset I consider sane. (Heck, there's even a book titled "SQL-99 Complete, Really".) XQuery is a full blown language, as capable as XSLT but more procedural-looking, and complete with a type system drawn from XML Schema.

Looking at all of this, I guess I can see where it's useful to some people in some situations, but to me it's mostly just more junk to look at once and ignore. As fond as I am of XML, the notion that markup is an excuse to create not just one (XSLT) but two (XQuery) Turing-complete languages seems bizarre at best. E4X, a set of XML extensions to JavaScript, is at least a relatively minor modification, but still feels strange in the context of all of these things which have less and less to do with markup and more and more to do with programming.

I suspect that over time I'll be retreating to my own home-grown toolkit, adding XPath 1.0 to it but no more, and letting the behemoths create whatever they like for whoever is supposedly buying it. We can still exchange documents; there's no need to exchange superstructure.

I've always done most of my programming by myself, whether it was for work or my various open source projects. When I first heard about Extreme Programming and things like pair programming, I pretty much wrote it off as stuff that might be nice for people social enough to program in groups with a greater tolerance for "enterprise" work in general.

Okay, some aspects, like iterative development, seemed pretty cool. Code standards make sense to me, and I try hard to do things like comment my code and use meaningful variable names - in large part because I may have to reuse or modify it months or years later. Stuff like communication and shared ownership of code, though... do I really want to set up multiple personalities talking to myself and arguing about code direction?

The one piece of the XP puzzle that's always been on my "I wish I had energy to do that" list is testing. I've done plenty of testing on all kinds of code in all kinds of circumstances, but it's never been the circumstances I wanted. My testing style hadn't improved much since my ZX81 and AppleSoft days. The kind of code I tend to write is all about XML transformations, with various randomly nested and intermixed structures - not exactly easy unit testing fodder.

Today I finally changed that. I've been building a partial XML parser, and have a whole set of context-tracking structures that I need to have working reliably for the rest to make sense. I knew it worked fine on the particular cases I'd tested, but they were based on complete documents, and didn't necessarily set off every aspect. Writing unit tests (in JUnit) that explore specific aspects of these processes has been pretty easy and extremely useful.

I've already found a few bugs - largely in the way I was testing code before - and feel a lot brighter about the foundations of the code I'm using. In about two hours of test writing, I've reduced the number of places I might search for errors by about a third, and reduced my paranoia about making changes far more drastically. I still have some snarled code ahead of me, but I finally feel like I have the right tools for unsnarling it without create even larger tangles.

I suspect unit testing is probably the piece of XP with the most potential benefit for solo programmers, though CVS has saved me from myself a few times and let me open a wider door to other people's work. Unit testing also combines immediate benefits for me with the prospect of an easier time for other developers who might someday want to build on this work, even if I never hear about it. Seems like a good thing all around.

20 Mar 2003 (updated 20 Mar 2003 at 13:57 UTC) »
dyork - I've got the Taig lathe as well, and really like it. It seems to cry out for tinkering, with a really basic foundation and all those attachments. Sadly, most of the attachments are for turning metal, and I have little clue how to do that, but making wooden pens is great for now. A four-hour drive to the nearest Lee Valley store is unfortunately too much for taking classes!

I've been busy on xml-dev, announcing a half-parser (yes, it's a parser that preserves the full text of the original XML document) and talking about Microsoft Office beta XML formats, with sample XML documents. (General, Access, Excel, Word).

I'll also be presenting on this stuff at the Open Source Conference in July. The last thing I presented there was Open Source, Open Data: What XML has to offer Open Source. Should be an interesting followup - I was thinking about .NET and XML then, but Office is both different and potentially more interesting in a lot of ways.

As for the rest of the world - ugh.

dyork - what kind of micro-lathe are you using to turn pens? I've probably made a dozen, and should get back into it.

The concrete nature of woodworking and the much greater sense of independence I get doing it makes it tempting for all the reasons that make me doubt tech. You can learn from others and teach others in woodworking, but there's not nearly the same sense that your destiny is welded to other people's business decisions.

I'm still programming, still writing, still editing. Just not sure it's what I want to do for my next forty years. (I'm 32.)

46 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!