It seems that processing instructions have been forgotten by the majority of people creating XML tools and specifications.
It seems that processing instructions have been forgotten by the majority of people creating XML tools and specifications.
Am I the only one who thinks that the XML world has totally missed the power present in processing instructions? I have often looked at various XML standards and wondered why they don't do more with processing instructions.
For instance, XHTML strict is supposed to be a step toward separating content from presentation - the presentation details are supposed to be in a CSS file. That's all fine and dandy, but the br tag still remains. This is terrible, especially since it could be fixed by simply converting it to a processing instruction - like <?xhtml-br?>
Then there's XLINK. Why not, instead of specifying specifically what attributes are links, create processing instructions which would map the DTD's own elements/attributes to linkable ones? It sounds so simple, and would remove a lot of the crap going into attributes.
Anyway, does anyone else feel this way, or am I all alone? Is there a way to contact the people making these standards to say "hey - this is better handled by processing instructions!"
You are missing something fundamental, which is that XML is just the same all over again. It is not as if structured content hadn't existed before, it is just being hyped more - mostly by people who do not understand. Is XML, XSLT, RDF etc really less compless or that much different from what we had before? Does it really make people more productive? Does it really allow people who do not understand programming to "program" computers? I think not; it is just "programming" in another envelope.
Philosophically, the autopoiesis of such complex systems is really quite interesting. You are assuming that it is aimed to solve a real problem; that is however only part of the story...
I agree that XML is way overhyped. However, it _does_ solve a real problem, simply that everyone and their dog had their own method of doing structured content. XML gave everyone a standard parser to use for all sorts of structured content, which was roughly compatible with HTML. It basically standardized how structured content was handled. It's not the panacea that everyone makes it out to be, but it's nice because it's easy to learn, easy to use, and it's basically the same everywhere. Having a premade parser simplifies my job as a programmer because I don't have to invent a new syntax, and it simplifies the user's job because they don't have a million new escape characters to learn.
When I worked at Wolfram Research, I started creating documentation for the Web team. I started with LaTeX, but found that it had a bunch of new special rules, and I was the only person willing to take the time to learn them. However, almost everyone knows HTML, and knows how to do basic entities and such with HTML. Therefore, I decided upon DocBook. Because of it's SGML heritage, it was easy for people who knew HTML to transition to DocBook. Now everyone can maintain the documentation. That's the beauty of XML - it doesn't waste everyone's time by reinventing structured content for every project.
Personally, I hate XSLT. Stylesheets need to be done server-side with a _real_ programming language to be useful, but that's another issue altogether.
Anyway, I like XML a whole lot. I get bothered by people who blow it out of proportion. In fact I started reading a book the other day which basically said that dynamic, interactive web sites aren't possible without XML. ?????
I received the following reply to my question in email, and will reproduce it here (the person did not have an advogato account):
n.b.--I send this to you in response to your advogato article. I don't have an advogato account, so I can't add comments, but feel free to reproduce these comments.
One of the reasons processing instructions have largely been ignored is a strong institutional feeling against them at the W3C, which encompasses personal disdain from Tim Berners-Lee. (See <URL:http://www.w3.org/TR/xml-stylesheet/>, which suggests that PIs will not be used in W3C Recommendations in the future, and <URL:http://lists.w3.org/Archives/Public/www-tag/2002Feb/0057.html> (and ensuing discussion.)
In response to your specific examples,
isn't a PI because XHTML is HTML 4.01 hastily XML-ized, warts and all. What you're proposing for XLink sounds similar to Architectural Forms, but these haven't caught on in the web environment...oh, and they're invoked by a PI! See <URL:http://www.w3.org/MarkUp/future/papers/roconnor.html> and the references invoked thereby.
It remind me of what was said about C++ twenty years ago (The Mytical Man-Month), about the misuse of C++. :)
First, Object Oriented programming had a strong growth during the decade, through C++ development for instance. Fred Brooks highlight the master quality of OO programming :
Then he noticed that C++ has developed slowly. One of the explanation he cites being that people used C++, as a tool or a language, though it is a way of designing, and that the misuse of this language was being spreaded. Fred Brooks sees in this a severe case of the management malady concerning methodological improvement. No prevision as been made on the expected return on investment. Therefore, managers were expecting short term returns, while designing methods are long term investments. He also notices that, for instance, people wanted to reuse without noticing that it doubles the overall time for building software due to additional components testing, documentation and so long. Reusable components, cannot be done ex nihilo. Users and developers should share common notations, concepts and so long. He emphases his point of view with Jones report that shows that 10% at most of programmers and customers reuse components, and that in most of cases this is due to organisational factors.
A native speaker routinely uses 10000 words. With component reusing, the vocabulary grows in such a gigantic way, that without having reflection on the way to extend the language, people will be unable to use components, and anyway increasing the complexity in a language will result of additional accidental difficulties.
Fred Brook wrote this book in the mid 70's. The whole book is a reflexion on whether or not we can prevent software from being late/buggy. First he tries to illustrate that better practices exist . Then, he states there are no silver bullets (such as XML), no methods, no technical tools able to make incertitudes disappear during the process of software development, because efficiency heavily rely on «peopleware».
In short, the problem is not XML W3C or anyone. It may be more likely
the fact most organizations don't conceive that methods that are
costless can be valuable. (The same with libre software). If they bought
10000$ linux/BSDs distros, they would think it is something to study. If
W3C would sell 10000$ conf. on XML, they would value the advice and
conform to the best practices.
To answer you: you are not alone to think this, and it happens for years with companies with all kind of interesting technologies/concepts.
from an architectural point of view, PIs are a problem in general they are not structured information, there is limitation in their names, their processing is usually optional, and well the more you stack up optional processings in a pipeline the less reliable is your output (or the less interoperable you get).
Most recent developments of XML specifications tends to avoid them for those reasons, and there seems to be common agreement to not use them at least at that level.
The "problems" you mention with processing instructions are not problems, they are the way they differ from elements.
For example, the problems you cite are the same problems that exist for other secondary standards such as XLINK. Specifically, xlink forces it's own set of attributes instead of using the DTD's. If a program is explicitly watching for xlink attributes, it would be no different than if it were watching for xlink processing instructions.
The fact that they are not structured information is precisely what makes them valuable. Consider the following process - A content creator develops structured content in XML. A typesetter then creates a stylesheet for use on that document type (or one was already created). However, let's say that there was a standard element type, but for some aesthetic reason (not oriented on structured content), they want to change the style of ONE element. The fact is, if it's not inherent in the document structure, putting it in the elements and attributes goes against the grain of structured documents. Thus, processing instructions are the best way to go, specifically because they do not correlate with document structure. DocBook tries to fake this with the "class" attribute, but that is quite hacky.
Processing Instructions are optional for a reason - they are often application-specific. Architectural forms processing instructions are intended for a single application - the architectural engine.
For example, I have a program called xmltangle (http://literatexml.sourceforge.net/) which uses processing instructions to do Literate Programming. The processing instructions are specifically for my application.
Your argument assumes a linear pipeline, but if the pipeline branches in several directions, you find processing instructions saving the day.
The reason to code things as elements rather than PIs is to enable them to be processed witht he same tools as the rest of the document. Your literate programming example uses PIs as if they're elements but you've no way of validating them (what if an end-PI is missing, what if they're badly nested). You need another API to get at the data encoded in PIs.
Your argument seems to be that extending a DTD like docbook is awkward and if you want to introduce new markup (like your literate programs) then use PIs instead of tags so that the document still conforms to the original DTD. Docbook though was meant to be extended as was XHTML and as can any decent XML DTD or schema. Namespaces let you combine XML applications in a principled way with in the XML model.
If your typesetter really wants to change the appearance of one element in a document then it will need to be marked up in some way. You propose a PI, I'd suggest using a generic attribute like id which most DTDs provide. If such a generic attribute isn't present the stylesheet can always refer to that element directly using XPath.
The reason my program uses PIs is not because I don't want to extend DocBook, but rather I don't want to be bound to _any_ specific DTD. In a later version, I'm going to add PIs which allow you to map DTD-elements to functionality, but in any case, a PI is needed to provide information that is specific to my application to my application.
As for the typesetter example, you are forgetting that one of the purposes of markup languages is to _separate_ content from presentation. Generic attributes are problematic, especially when your document goes through multiple pipelines. In addition, elements and attributes are structural features meant to convey the semantics of the document. Changing the structure to alter the layout in a single application is a bad road to tread on.
You also point out that PIs can't be verified. That is a _good_ thing, because PIs are application-specific, while XML documents may be run through multiple applications. Thus, it is the application which is responsible for validating it's own instructions.
There is one place in the standards where they do use PIs, which is quite good, and that is for XML stylesheets. xml-stylesheet is a processing instruction. Why do you think this is? Since it is specific to a display application, it doesn't belong in the DTD either as an element or a generic attribute. The fact that it is a processing instruction means that it can be used for DTDs that weren't even intended for this purpose. For example, if there was an XML document that wasn't intended for display, but I wanted to run it through a display system anyway, I only need to add a processing instruction to tell the display system how to process it. It's not part of the structure of the document - it's specific to the processing of a specific application.
New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!