Older blog entries for simonstl (starting at number 58)

4 Jan 2004 (updated 4 Jan 2004 at 02:03 UTC) »

It's a new year, and I've been thinking about where to put my energy.

XML, great stuff though I still think it is, is pretty much complete. I think we've learned over the past few years that the stuff that was actually a simplification of existing practice was good, and the rest should be used cautiously or (in the case of W3C XML Schema and specs it's infiltrating) ignored. The RELAX NG folks have created a sane schema language, the interesting action in the space has largely moved away from the W3C, and we're now at the point where everyone can create whatever vocabulary they like.

Not that they create particularly good vocabularies, especially if they focus on W3C XML Schema as the path to new vocabularies, but there's only so much I can do to keep people from banging their heads against the wall.

So if XML is no longer my main technical focus, what's next?

I have two main areas of interest at the moment, one even more abstract than XML and one more (well, mostly) concrete.

On the abstract side, I've been reading Christopher Alexander's The Timeless Way of Building and A Pattern Language, as well as the first volume of The Nature of Order. It's a lot more exciting than the Gang of Four Design Patterns work that derives from it, and I think I'll write a lot this year on how programmers can learn to do better work by taking aesthetics seriously, on a lot of different levels.

On the more concrete side, I've been looking into mapping and computing. I've been writing a blog that focuses strictly on one 96 square mile town, and maps are an important part of that, especially given the planning process that's currently in motion here. It's been interesting to see how most of the road network was in place by 1900, but a few key changes have had dramatic impact. I'm also sorting through the avalanche of census data that's available from the 2000 census, examining it both through GIS tools and through databases.

Between those two things (and a healthy continuing dose of XML, I'm sure), it should be a good 2004.

I just sealed my driveway. Two days of cleaning, prep work, and finally applying the sealer.

I should be so careful in my programming.

(Now let's just hope it doesn't rain before the sealer sets.)

the web is no good unless it can be a sound foundation for the semantic web and web services too.

Gah. And my house's foundation is no good unless it can be a foundation for a skyscraper and a gas station too. Comments like these from people I think should know better occasionally drive me to rant.

In more exciting news, I'll be ranting next week at the Extreme Markup Languages conference in Montreal. It's always been one of my favorite shows, and this year they gave me a Daily Polemic. The original version of What can you do with half a parser? is pretty good (I think), but for the polemic it's getting a cast of Playmobil figures and some toxic black goo.

18 Jul 2003 (updated 18 Jul 2003 at 02:21 UTC) »

Last week was a blur of two conferences - OSCON and the Sells Brothers Applied XML Conference, both in the Portland, Oregon area. The two conferences were very different for a few fundamental reasons:

  • OSCON is completely dominated by open source cultures and values, despite the Microsoft-paid lunch, while Applied XML was largely a .NET fest, with a host presently employed by Microsoft and lots of Microsoft-oriented content.

  • While OSCON had an largely tutorial XML track (which I thought went very well, though I'm biased as I chaired it), Applied XML was a single-track conference where every session had something to do with, and generally focused on, XML.

Applied XML was smaller and more tightly focused, but it had a similar energy level to OSCON, at least in the hallway conversations. OSCON was amazing as usual for bringing together developers from a variety of different communities, and letting them explore sessions they found interesting. A lot of tracks benefited from crossovers between different kinds of developers.

One especially interesting bit was the crossover between the conferences. There was a BOF session Wednesday night on what it would take to implement dynamic languages (Perl, Python, Ruby, etc.) efficiently in the .NET environment, which was built with statically typed languages (think C#). That conversation's probably just getting started, but there were some fascinating bits. Peter Drayton and Brad Merrill of Microsoft, who hosted the BOF, were at both shows, and seemed happy in both contexts.

(There was also a surprising amount of RDF in conversations at both conferences, continuing a trend I noticed at OSCOM. It wasn't on either conference program, but it was in the bars and hallways.)

Though I'm really tired of traveling, I'm looking forward to one last conference this summer - one that combines the small size and XML focus of the Applied XML conference with the community approach of OSCON. Extreme Markup Languages, running from August 4-8, has seen fit to give me a Daily Polemic. I suppose it's appropriate given my usual style, but it's still a pretty scary responsibility.

To live up to that challenge, I've turned to the wonderful world of Playmobil. Playmobil figures make excellent computer consultants, especially now that Playmobil offers office equipment. I'm hoping to use SMIL for the slides, though maybe it'll be SVG. We'll see.

I'll be taking next week off - I decided it was time to spend some time working on my own projects and remember why I like doing this stuff. (Maybe I won't - we'll see!) In addition to the Playmobil photo shoot, I'm hoping to put some work in on my Java tools for processing XML. I haven't been able to spend more than one day at a time on them lately, with months between sessions.

A week of training

I spent last week in Ottawa, taking five days of training from Ken Holman on XSLT and XSL-FO. I had plenty of job-related reasons to take the training, from growing use of XSLT in my work to re-connecting with users in the field, but I'll admit it was a pleasant luxury to take a week to look at a complex technology up close.

I blogged each day over on my O'Reilly blog (1 2 3 4 5), but it's interesting to look back at the course as a whole. Three days of XSLT and two days of XSL-FO is a lot, but even that's kind of a forced march through the technologies, really only scratching the surface. Ken did a great job of covering the overall story, making us think our way through exercises on key features, and pointing out potential problems. Actually having worked through the details (and coming back with a handout covered in post-it notes) should help me remember what I did.

I haven't taken a formal training course in years, though. I've been mostly self-taught, with the occasional conference tutoral supplementing books, specs, and email. Face-to-face has some huge advantages, I have to say. Immediate interactions and immediate gratification are wonderful, as is having a pre-built set of exercises really intended for this kind of back-and-forth interaction.

I already knew (and heck, disliked) a good deal of XSLT before I went in, but this was an opportunity to check and expand on what I knew. Fortunately, the course was structured so that different users could get what they needed - including those of us who'd already been over a lot of it. (The XSL-FO class included at least one attendee who knew a lot more than I did about any of the material.)

Training isn't always an option for everyone, but after the past week, I'm thinking it's something I'll be encouraging a lot more.

I seem to have a lot of conferences on my schedule at the moment, just when I'm busiest with "real work". In any case, if anyone's interested in XML-related conferences, I wrote up what's on my radar. I may or may not make it to everything here, but suspect there may be a few folks here who will.

Content-rich keynote presentations seem to be pretty rare, though they've improved some since the bubble faded. I was lucky last week to go to a conference (XML Europe) which had two excellent keynotes back to back, both exploring the synergy of open source and XML.

I've written both of them up for xmlhack. Jon Bosak explored how he hopes open standards and open source can work together, while Daniel Veillard examined how XML is used by open source projects. They were very different presentations, kind of like looking through opposite ends of a telescope, but they complemented each other beautifully.

I've been thinking a lot lately about computing cultures. XML culture, for instance, feels very different from Java culture. Though I do most of my programming in Java, the work I do leads me into creating XML-oriented interfaces that are far removed from the suggestions in Effective Java, for instance.

While I program in Java, I don't think I'm part of Java culture - I even find some aspects of it profoundly disturbing. I've concluded over time that Python is probably a more appropriate medium for what I want to do, but I've got all this easily-mined work in Java...

I think similar issues arise in information modeling and storage. I wrote a short piece on it yesterday, "The (data) medium is the message". The bit I quoted from McLuhan, which I think is pretty much at the heart of the matter, is:

"Environments are not passive wrappings but active processes."

Programmers tend to think of ourselves as active and the environments we program as passive, but it's definitely a two-way street, even before you get into the environment-changing possibilities of open source.

I just got back from NYC, where I presented on the new Microsoft Office XML stuff. Much of the presentation was demo, so the slides are hardly a complete picture, but there's a rough outline there. I'd be curious to hear if anyone's interested in this stuff - there's sort of a "free love" opportunity here, if not free beer or free speech.

The conversations after the presentation were also interesting, and seemed to reflect some of the stories about XML that trouble me the most - the notion that we can agree on vocabularies and interop will come automatically with that agreement.

There are a lot of problems with this vision, but perhaps the most dangerous aspect of those problems is that they only tend to emerge as the scale of the work - measured in the number of users and the scope of the vocabulary - increases. In small cases, it's pretty easy to put together some basic stuff quickly and make it work. Configuration files that belong to one programmer and one program are a classic case. Moving from there, files that move within a small circle of people are easy to deal with. Sometimes these experiments bear fruit that works well - HTML, for instance - but they pretty much always encounter difficulties as the scope or audience grow.

The answer in the SGML community, and more generally in the XML and computing communities, has been to form committees of people who know markup issues and the relevant information and hope that their consensus reflects the reality of the information problems and how to solve them.

Committees, especially successful ones, are often reflections of the problems they have to solve. Who's invited? How big is the committee? Which view of a given subject is the right one? Even a single transaction can look very different from different perspectives. What kind of data is involved, and how is it communicated? How much can a single group of people accomplish when their information world is in constant flux?

Some committees accomplish a lot, others accomplish very little. Some stay in touch with a wider audience, and others put up barriers - sometimes to avoid information overload, sometimes to avoid criticism. Versioning is a constant problem for specifications, as the world moves on, and XML's intrinsic promise of 'extensibility' is infuriating for people who want to control extensions.

Schemas don't help this much, except to formalize solutions and give computers a chance of comprehending them. The formalisms provide a vocabulary in which developers and committees can express their intentions, but there's nothing intrinsic about a schema - whether it be a DTD, XML Schema, RELAX NG, Schematron, or even RDF - that pins down meaning in any immutable or unquestionable sense.

Some people seem intent on reaching for the semantic sky, pinning down vocabularies with labels like butterflies. Some of their results are quite beautiful, at least to fellow butterfly collectors, but live butterflies tend to flutter around a lot.

XML isn't going to solve the problems of people who want to pin down the meanings of information stored in computers. It's demonstrated that a consistent syntax for labeling and structuring information is useful for some people and some tasks, but that's about the limit. For some of us, that's too much already. For others of us (myself included), that's enough - going much beyond that seems to cost more than it's worth.

Cook up your own stuff, and get used to consuming what other people offer you, even if it isn't exactly in the form you wanted or expected. People have long been better at this kind of work than computers, but it's time to start accepting chaos rather than trying constantly to control it. That might even mean strengthening the role of humans in information processing again - strange to some, useful to others.

I talked with various database-oriented folks at a woodworking picnic this weekend and I came back to discussion of XQuery at work, so I've been thinking a bit harder about what I learned from relational databases and how I apply it to XML.

I've spent most of my time in relational databases using smaller tools that had a grasp of relations and SQL but didn't spend enormous effort cramming more features into their SQL support. I started in Microsoft Access (I know, I know) and I've used MySQL and I've been very happy with their abilities to store and retrieve information. It's been my privilege never to have to work on "Enterprise" relational databases - I've documented an Oracle setup and tinkered with a copy of DB2 5.0 that IBM sent to my doorstep one day (dunno why), but that's it.

When I look at what I do when I'm writing programs to work with XML, I'm happiest when I can work in roughly the style I've used with relational databases - get a chunk of information with a basic query, then process it in my own (usually Java, but sometimes XSLT or other) environment. That way I can apply my existing skills without having to learn yet-another-goddamn-programming language.

This seems to be the opposite approach of what passes for the conventional wisdom these days. Stored procedures have been common for years, and SQL has grown far beyond the subset I consider sane. (Heck, there's even a book titled "SQL-99 Complete, Really".) XQuery is a full blown language, as capable as XSLT but more procedural-looking, and complete with a type system drawn from XML Schema.

Looking at all of this, I guess I can see where it's useful to some people in some situations, but to me it's mostly just more junk to look at once and ignore. As fond as I am of XML, the notion that markup is an excuse to create not just one (XSLT) but two (XQuery) Turing-complete languages seems bizarre at best. E4X, a set of XML extensions to JavaScript, is at least a relatively minor modification, but still feels strange in the context of all of these things which have less and less to do with markup and more and more to do with programming.

I suspect that over time I'll be retreating to my own home-grown toolkit, adding XPath 1.0 to it but no more, and letting the behemoths create whatever they like for whoever is supposedly buying it. We can still exchange documents; there's no need to exchange superstructure.

49 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!