Older blog entries for simonstl (starting at number 51)

I've been thinking a lot lately about computing cultures. XML culture, for instance, feels very different from Java culture. Though I do most of my programming in Java, the work I do leads me into creating XML-oriented interfaces that are far removed from the suggestions in Effective Java, for instance.

While I program in Java, I don't think I'm part of Java culture - I even find some aspects of it profoundly disturbing. I've concluded over time that Python is probably a more appropriate medium for what I want to do, but I've got all this easily-mined work in Java...

I think similar issues arise in information modeling and storage. I wrote a short piece on it yesterday, "The (data) medium is the message". The bit I quoted from McLuhan, which I think is pretty much at the heart of the matter, is:

"Environments are not passive wrappings but active processes."

Programmers tend to think of ourselves as active and the environments we program as passive, but it's definitely a two-way street, even before you get into the environment-changing possibilities of open source.

I just got back from NYC, where I presented on the new Microsoft Office XML stuff. Much of the presentation was demo, so the slides are hardly a complete picture, but there's a rough outline there. I'd be curious to hear if anyone's interested in this stuff - there's sort of a "free love" opportunity here, if not free beer or free speech.

The conversations after the presentation were also interesting, and seemed to reflect some of the stories about XML that trouble me the most - the notion that we can agree on vocabularies and interop will come automatically with that agreement.

There are a lot of problems with this vision, but perhaps the most dangerous aspect of those problems is that they only tend to emerge as the scale of the work - measured in the number of users and the scope of the vocabulary - increases. In small cases, it's pretty easy to put together some basic stuff quickly and make it work. Configuration files that belong to one programmer and one program are a classic case. Moving from there, files that move within a small circle of people are easy to deal with. Sometimes these experiments bear fruit that works well - HTML, for instance - but they pretty much always encounter difficulties as the scope or audience grow.

The answer in the SGML community, and more generally in the XML and computing communities, has been to form committees of people who know markup issues and the relevant information and hope that their consensus reflects the reality of the information problems and how to solve them.

Committees, especially successful ones, are often reflections of the problems they have to solve. Who's invited? How big is the committee? Which view of a given subject is the right one? Even a single transaction can look very different from different perspectives. What kind of data is involved, and how is it communicated? How much can a single group of people accomplish when their information world is in constant flux?

Some committees accomplish a lot, others accomplish very little. Some stay in touch with a wider audience, and others put up barriers - sometimes to avoid information overload, sometimes to avoid criticism. Versioning is a constant problem for specifications, as the world moves on, and XML's intrinsic promise of 'extensibility' is infuriating for people who want to control extensions.

Schemas don't help this much, except to formalize solutions and give computers a chance of comprehending them. The formalisms provide a vocabulary in which developers and committees can express their intentions, but there's nothing intrinsic about a schema - whether it be a DTD, XML Schema, RELAX NG, Schematron, or even RDF - that pins down meaning in any immutable or unquestionable sense.

Some people seem intent on reaching for the semantic sky, pinning down vocabularies with labels like butterflies. Some of their results are quite beautiful, at least to fellow butterfly collectors, but live butterflies tend to flutter around a lot.

XML isn't going to solve the problems of people who want to pin down the meanings of information stored in computers. It's demonstrated that a consistent syntax for labeling and structuring information is useful for some people and some tasks, but that's about the limit. For some of us, that's too much already. For others of us (myself included), that's enough - going much beyond that seems to cost more than it's worth.

Cook up your own stuff, and get used to consuming what other people offer you, even if it isn't exactly in the form you wanted or expected. People have long been better at this kind of work than computers, but it's time to start accepting chaos rather than trying constantly to control it. That might even mean strengthening the role of humans in information processing again - strange to some, useful to others.

I talked with various database-oriented folks at a woodworking picnic this weekend and I came back to discussion of XQuery at work, so I've been thinking a bit harder about what I learned from relational databases and how I apply it to XML.

I've spent most of my time in relational databases using smaller tools that had a grasp of relations and SQL but didn't spend enormous effort cramming more features into their SQL support. I started in Microsoft Access (I know, I know) and I've used MySQL and I've been very happy with their abilities to store and retrieve information. It's been my privilege never to have to work on "Enterprise" relational databases - I've documented an Oracle setup and tinkered with a copy of DB2 5.0 that IBM sent to my doorstep one day (dunno why), but that's it.

When I look at what I do when I'm writing programs to work with XML, I'm happiest when I can work in roughly the style I've used with relational databases - get a chunk of information with a basic query, then process it in my own (usually Java, but sometimes XSLT or other) environment. That way I can apply my existing skills without having to learn yet-another-goddamn-programming language.

This seems to be the opposite approach of what passes for the conventional wisdom these days. Stored procedures have been common for years, and SQL has grown far beyond the subset I consider sane. (Heck, there's even a book titled "SQL-99 Complete, Really".) XQuery is a full blown language, as capable as XSLT but more procedural-looking, and complete with a type system drawn from XML Schema.

Looking at all of this, I guess I can see where it's useful to some people in some situations, but to me it's mostly just more junk to look at once and ignore. As fond as I am of XML, the notion that markup is an excuse to create not just one (XSLT) but two (XQuery) Turing-complete languages seems bizarre at best. E4X, a set of XML extensions to JavaScript, is at least a relatively minor modification, but still feels strange in the context of all of these things which have less and less to do with markup and more and more to do with programming.

I suspect that over time I'll be retreating to my own home-grown toolkit, adding XPath 1.0 to it but no more, and letting the behemoths create whatever they like for whoever is supposedly buying it. We can still exchange documents; there's no need to exchange superstructure.

I've always done most of my programming by myself, whether it was for work or my various open source projects. When I first heard about Extreme Programming and things like pair programming, I pretty much wrote it off as stuff that might be nice for people social enough to program in groups with a greater tolerance for "enterprise" work in general.

Okay, some aspects, like iterative development, seemed pretty cool. Code standards make sense to me, and I try hard to do things like comment my code and use meaningful variable names - in large part because I may have to reuse or modify it months or years later. Stuff like communication and shared ownership of code, though... do I really want to set up multiple personalities talking to myself and arguing about code direction?

The one piece of the XP puzzle that's always been on my "I wish I had energy to do that" list is testing. I've done plenty of testing on all kinds of code in all kinds of circumstances, but it's never been the circumstances I wanted. My testing style hadn't improved much since my ZX81 and AppleSoft days. The kind of code I tend to write is all about XML transformations, with various randomly nested and intermixed structures - not exactly easy unit testing fodder.

Today I finally changed that. I've been building a partial XML parser, and have a whole set of context-tracking structures that I need to have working reliably for the rest to make sense. I knew it worked fine on the particular cases I'd tested, but they were based on complete documents, and didn't necessarily set off every aspect. Writing unit tests (in JUnit) that explore specific aspects of these processes has been pretty easy and extremely useful.

I've already found a few bugs - largely in the way I was testing code before - and feel a lot brighter about the foundations of the code I'm using. In about two hours of test writing, I've reduced the number of places I might search for errors by about a third, and reduced my paranoia about making changes far more drastically. I still have some snarled code ahead of me, but I finally feel like I have the right tools for unsnarling it without create even larger tangles.

I suspect unit testing is probably the piece of XP with the most potential benefit for solo programmers, though CVS has saved me from myself a few times and let me open a wider door to other people's work. Unit testing also combines immediate benefits for me with the prospect of an easier time for other developers who might someday want to build on this work, even if I never hear about it. Seems like a good thing all around.

20 Mar 2003 (updated 20 Mar 2003 at 13:57 UTC) »
dyork - I've got the Taig lathe as well, and really like it. It seems to cry out for tinkering, with a really basic foundation and all those attachments. Sadly, most of the attachments are for turning metal, and I have little clue how to do that, but making wooden pens is great for now. A four-hour drive to the nearest Lee Valley store is unfortunately too much for taking classes!

I've been busy on xml-dev, announcing a half-parser (yes, it's a parser that preserves the full text of the original XML document) and talking about Microsoft Office beta XML formats, with sample XML documents. (General, Access, Excel, Word).

I'll also be presenting on this stuff at the Open Source Conference in July. The last thing I presented there was Open Source, Open Data: What XML has to offer Open Source. Should be an interesting followup - I was thinking about .NET and XML then, but Office is both different and potentially more interesting in a lot of ways.

As for the rest of the world - ugh.

dyork - what kind of micro-lathe are you using to turn pens? I've probably made a dozen, and should get back into it.

The concrete nature of woodworking and the much greater sense of independence I get doing it makes it tempting for all the reasons that make me doubt tech. You can learn from others and teach others in woodworking, but there's not nearly the same sense that your destiny is welded to other people's business decisions.

I'm still programming, still writing, still editing. Just not sure it's what I want to do for my next forty years. (I'm 32.)

Quiet's kind of nice

I've been enjoying myself lately by not participating in a number of things that I should probably care about.

I used to complain here about the madness of URIs, but this fine formulation has let me stop worrying about it. The W3C's Technical Architecture Group (TAG) isn't likely to accept that approach in my lifetime, but that's fine. I've stopped expecting my reality and their reality to have much in common, so I can safely ignore that august body's busy mailing list. I don't need Platonic Forms in my life, thank you very much.

I'm still active, though a little less so, on xml-dev, but even that list feels like it's mellowed a bit. More interestingly, though currently quiet, I started up the xml-hypertext mailing list. It's fairly peaceful there so far, but hopefully it'll grow with time.

Meanwhile, maybe I can get some work done.

Is civility harmful?

Mark Baker has an interesting response to a piece by Elliotte Rusty Harold on Web Services. Mark, despite calling himself a "Tech Curmudgeon", takes issue with Elliotte's use of the word "idiots", and asks "So please folks, try to keep it civil. Comments such as this one only serve to alienate, which is the last thing we need."

I'm not a particularly friendly or polite person, and didn't see Elliotte's comments as anywhere close to out of line. Still, I think I agree with Mark that "Comments such as this one only serve to alienate". Where I part company is that I think alienation is important, and that pretending we like each other more than we really do is likely to produce muddy compromises at best.

In the case of Web Services and XML, it's become clearer and clearer over time that these technologies are barely related and frequently in conflict. Web Services happens to use XML, but they use it quite badly, from the perspective of many XML people. As Elliotte says "Web Services violate the fundamental design of XML". Not only that, but the ambitions of Web Services have been a driving force behind some rather toxic specifications, notably W3C XML Schema. I don't mind saying that Web Services is poisoning XML, turning what was once a simplification into a major new set of complications.

Does that alienate people? Yes! It should. I'd love to get the Web Services folks to rethink their foundations. Failing that, making clear that there are serious points of friction seems like the best course of action - and being civil has little to contribute to that. Forking is not a risk here - it's an opportunity.

Of course, I also found this rant well-worth reading, though it's quite completely over the top. Expecting progress to come in neatly-wrapped boxes with thank-you notes attached seems like a lot too much to ask - and counter-productive, to boot.

Every now and then, something new and interesting surfaces in the world of URIs. The notion of a "probabilistic web" is both different and well-worth considering. Maybe there is something to all those lines about light appearing in the darkest hour.

Eventually it becomes clear that any effort to discuss URLs or URIs is pointless.

Uniform Resource Identifiers are the strangest religion I've encountered on the Web. People use them in all kinds of largely incompatible ways, but somehow we're supposed to believe that since the URL part works and survives things like cache issues, these magical abstractions will solve all our identification problems.

Meanwhile, no one can give me a straight answer on how to identify a representation with a URI reference. The short answer, of course, is that you can't, though the fragment identifier part is amusingly representation-dependent and it seems like a representation must in some sense be a resource itself... but you'd better stay away from that hack of a DOS-like file extension or fall into sin.

The Web as a huge set of Platonic Forms would be hysterically funny if it weren't so throughly sad.

42 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!