Advogato: Structured data and the death of WYSIWYG

Posted 5 Dec 1999 at 05:36 UTC by Radagast

As commercial WYSIWYG word processors are close to dying from bloat, free software projects scramble to catch up, going in the exact same direction. But it might be time for structured data editing to emerge from the cloud of hype.

Today, almost all document editing is done in WYSIWYG applications. It's been heeded as the perfect way to write, where what you see on the screen exactly matches what you get on paper, and modern WYSIWYG word processors are extremely close to that ideal. It took a bit of work, though. In fact, in addition to computer games, the main driving force behind replacement and upgrading of hardware on the desktop has been Microsoft Office, with its endless cycle of upgrades and new releases running up the resource requirements, and when there was nothing much left for a word processor to do, they decided to suck up few more resources by adding animated assistants to the thing.

So, where are we? The stuff on your screen is damn close to the stuff you get when you print. The fonts are antialiased, your image files appear in full 24-bit glory when inserted, and everything is well. Or is it? I've worked in office environments where MS Word is the de facto, or even official, standard for writing. Watching people struggle to get their templates to work, all the letters to look the same, the document not to rewrap so there's a single line on the final page, how to synchronize the different versions of their documents, and generally just making sense of the inane way Word makes bulleted lists, makes me think there's a better way.

This is where the structured document buffs stick their head in. They usually come from technical writing backgrounds, and they have a remedy. Write the document in LaTeX, or an SGML application like DocBook, or something similarly structure-oriented, and let a backend take care of the formatting to paper. Trust the backend to be smart enough to handle everything, and there you are. Except... Where are the tools? "Oh, it's just plain text, you can use Emacs, or even Notepad." Well, how do I know what I just wrote will parse correctly? "Just save and do a test parse, and you'll get error messages that tell you what's wrong."

For normal users, this doesn't cut it. The two camps remain, the technical writers with their structured formats, and Emacs or, if their employer is rich, Framemaker/SGML, and the office workers with their MS word, userfriendly, as long as the macrovirus doesn't get you, and your hardware is fast enough.

But does it have to be that way? It seems that structured formats clearly solve a lot of the problems people have with WYSIWYG: It lets you concentrate on what you write, not how it looks, it's easy to get all your documents to follow a standard, and the semantics generally allow for smarter searching and archiving. But the tools are in the way.

What if we rethought the authoring process around structure? We have the formats, we know the semantics. And after years and years of Word resource use, most people have desktop processing power enough for almost anything. So how can we apply this, and maybe get a few completely new benefits along the way?

At Simplemente, the company in Mexico City I work for, we recently sat down and thought about this. It was in the course of developing a system for a client that currently used Word for a task it was entirely unsuited for, namely to write news bulletins for wire, paper and internet distribution. After some research, we figured out that XML would do exactly what we needed, but there were no tools suitable for end users who didn't have technical experience. So we figured out some ways of making one. Now that the system is in beta at the client site, we decided to make something more of it, and Conglomerate was born. The current codebase, which there are screenshots of at the site, and which we'll be releasing some test code of very soon, is a bit messy and suboptimal, because of limitations we ran into in the tools we used.

Here's a quick summary of some design goals, and what problems we ran into.

All the storage and searching should happen on a server, and the DTD and all other necessary data should be sent over the network to the client.
The transformation to target formats (postscript, HTML) should happen on the server end.
We wanted to ensure document validity at all times, so the parser and transformation engine couldn't choke as a result of user actions.
The client should be lightweight, and give a highly graphical representation of the data structure. However, the data structure should be as abstract and logically descriptive as possible.

Problems and solutions:

As it turns out, there weren't many systems to do distributed XML editing at all, and none of them were free. There's no real solution to this problem yet, we ended up using a RDBMS (Oracle, which the client already had) for storage, and made a server to interface with that. We're currently rewriting this part to build a lightweight object database, which will allow CVS-like revision control for XML documents.
There's also a noticable lack of a good XML transformation engine right now. The only ones we were able to find were things like XT, which is a good XSL-T implementation, but is written in Java, while all of our codebase is in C. We're looking into making an XSL-T engine in C, which can run on the backend.
Ensuring document validity proved to be the real pain. There are extremely few free implementations of validating parsers out there, but we managed to find one, written in pure C and under the GPL, even, namely RXP, which proved to be quite fast and reliable. However, all XML parsers in existence today are made for pure I/O. That is, they assume all you want to do is read XML data from a file, or write it to a file. What happens inbetween is your business, and they're just a black box. To do efficient editing of XML documents, there is a clear need for a new type of parser, one which first parses to a usable in-memory data structure, and then lays bare the functions for validation, of the whole tree, or just of subtrees. Ideally, it should also be possible to "validate in reverse", that is, ask the parser which tags are legal to insert starting at this point in the tree, and ending over here. This is vital, since it lets the application present a list of legal tags to the user, for instance in a pop-up menu (the current Conglomerate codebase doesn't get this right, because RXP doesn't allow it, and it's tricky to parse the DTD on your own when all you want to do is know if you can insert a tag or not. We are fixing it, however. We're hard at work on a new validating parser which is much more componentized, so that this type of thing and more becomes possible.
There's no accepted norm for displaying XML data while editing. The only approaches we've seen are ones like LyX, which comes close to a WYSIWYG approach, and thus is in danger of falling into the same hole as normal WYSIWYG word processors. Also, it's tricky to edit things like metainformation and the like in a WYSIWYG environment (if you can't see it, you can't get it). We went in the opposite direction, providing a visualization of structure, as shown in the screenshots on the Conglomerate site. This is code which actually works, but the inside is messy.

These were some thoughts on what it'll take to get structured editing to the masses. We believe that once someone (not necessarily us, but we've started, at least) builds this framework, structured editing can get down from the hype, and nest nicely on everyone's desktop. All it needs is this, and some good example DTDs and transformation sheets, and everything from documentation writing to business letters to resumes becomes orders of magnitude easier to write.

I'd love to hear comments and suggestions on these ideas. I apologize if they're not as edited and polished as they could be, but the idea of peer review is to get feedback, so, feel free.

Actually I think the approach used by lyx has a greater appeal, and is generally more useful. You dont want to be bogged down entirely in content structure, just as you dont want to be bogged down entirely in layout - but both are important. Lyx lets you set some out of band data with inserts as well, which seems to work ok.

Thot is another structured editing framework. This allows multiple concurrent views on the same data - a wysiwyg like view, a structure view, and even a lynx-like text-only view. Each can be edited directly and update the other views appropriately and in realtime.

If done right, there should be no reason you could not offer a similar approach - many views and many ways of representing and editing the same structured data.

I agree that it could be a good idea to have the possibility to view the same data in several ways, side by side. Actually, we do that (in an extremely rudimentary way) with the structure tree you can see on the left hand side of the Conglomerate screenshot. One of our thoughs has been to use Mozilla's XML viewing, which will support stylesheets and all, to display a more graphically styled view of the data as you edit.

However, the LyX paradigm only holds if there is one target medium for your document. For LyX, the target medium is paper. Yes, it's possible to generate HTML from LaTeX, but all the web pages I've seen created in this way are inferior products, LyX and LaTeX are clearly paper-oriented ways of creating documents. Conversely, there are some rather good WYSIWYG editors for HTML out there, Dreamweaver (on Win and Mac) springs to mind. They work really well, if there's one target medium. Printed HTML looks semi-decent at best.

So that's the thing. If you want real single-sourced, multiple-media document authoring, you need to go to very abstract structure, and then there's no real canonical way of displaying it as WYSIWYG. So, yes, Conglomerate will probably let you have real-time previews as at least HTML/XML with stylesheets, and possibly also realtime previews of paper output, but I'm a little more doubtful if they'll let you edit in those displays, since there's a lot of non-overlapping information.

One other thing we're really concerned about is to use as abstract and structural markup as possible to make the information easier to search and store. In the future (we hope) there are going to be gigantic repositories of information, some internal to companies and organizations, others open to the world, in searching and retrieval will be a nightmare, unless there's a lot of context information (or AI gets really smart really fast). So that's another reason for using rich context structure markup.

I'd love to hear more thoughts on this subject, though, it's fascinating.

Thanks for the post and the followup discussion.

The debate between "presentationists" and "structurists" has been raging for quite some time. Both sides have some impressive accomplishments; the WYSIWYG model dominates in word processors, yet XML is a hot new technology, and even Microsoft Word now uses style sheets.

Yet, a large gap remains between these two points of view. I've been interested for some time in finding a unified model that encompasses both. To this end, I believe that it's most helpful to analyze the quality of the final results, based on various parameters and the rendering context.

Simply put, the pure presentationist view is that the document author should have full control over presentation. PostScript and PDF are examples of pure presentationist file formats - they're basically vector-based resolution independent images of pages.

The pure structurist view is that the document author's job is to mark the document up with a complete description of the structure, but it is the responsibility of the renderer to try to present it with at least reasonably good quality. Presentation choices are ultimately up to the user on the viewing side, and also the context (such as page size and other parameters) may be highly variable.

In this perspective, the quality analysis hinges on one critical factor: whether the document is rendered in multiple different contexts, or in a single context. Radagast's example of the client requiring news bulletins to be distributed over wire, paper, and Internet is a classic multiple-context scenario. Any reasonably decent renderer should produce good enough quality for news bulletins. On the other hand, a purely presentationist format such as PostScript would be hopeless. I also want to underscore the value of this approach for people with disabilities. If you have the document marked up structurally, it's pretty easy to imagine running it through a voice synthesizer such as Festival and having it read the bulletins to a blind person. With a purely presentationist format such as PostScript, it's hopeless.

Early word processors and today's Desktop Publishing applications, on the other hand, have one goal: to produce a paper document. And in this context, just marking up the document, throwing it at a random renderer and hoping for the best is not going to give as good results as a presentationist approach. The example that Radagast gave of ensuring that there isn't a straggling last page with a single line on it is a good one, not to mention making columns line up at the bottom of the page, positioning graphics near the text referencing them, and of course the usual fiddling with fonts and spacing.

The size of the quality gap depends a lot on the type of document being rendered. For book pages, a structurist approach with a good renderer should be able to do quite well. But for highly graphical, layout-intensive documents, you need control over the presentation. I did the Gimp 1.0 CD insert with LaTeX (yes, in 4 colors), and let me tell you, that's not an experience I'm looking forward to repeating.

HTML represents the worst of both worlds. You have an essentially presentationist approach, but the browser is free to substitute random fonts or use wonky line breaking algorithms. Thus, the author really has very little control over the appearance of the page, especially from browser to browser. Yet, you're not really getting the advantages of structurist markup either, because you need to organize your document around the <table> tags and so on used for layout. CSS helps a little; you can use more structurist tags for some text attributes, but basically it's just a layer of indirection.

It's tempting to design a word processor based on HTML (or XML) and CSS (or XSL). However, such a system really doesn't let you optimize the document quality in the same way that a presentationist system would. Indeed, I'm fond of saying that a WYSIWYG HTML editor is a contradiction in terms.

If HTML represents the worst of both worlds, is it somehow possible to design a system that's the best of both worlds? I believe the answer is yes. The key innovation is consistent rendering. In other words, the document author should be able to specify the structure of the document, but also annotate it with information that optimizes the presentation. Then, this optimized document should display consistently, when the presentation is appropriate for the context. Otherwise, the viewer can fall back on a traditional structurist technique.

Fortunately, this is an innovation with a long and respected tradition. The most prominent example is certainly TeX, which is fanatically crafted to produce an absolutely identical page image no matter what the platform or output device. LaTeX extends TeX with a set of macros that allow the document to be specified essentially structurally. Thus, if you spend the time fiddling with various layout parameters to get all the figures on the right page (a familiar process to anyone who has written a scientific paper using LaTeX), in the end you get a file that prints correctly on whatever printer, dvi viewer, or whatever. In this way, TeX is just as much a specification as it is an implementation of a typesetting program.

I believe these ideas can be extended into the '90s (and even into the '00s). Structured editing doesn't have to be in vi, it can be interactive (as Conglomerate so beautifully shows). Further, rendering can be interactive too, in the true WYSIWYG sense. I don't see a compelling reason why we can't design a system that can lay out ads as well as Adobe InDesign, yet still preserve the document structure so that documents can be adapted as needed.

What this vision, requires, I think, is a specification for very high quality rendering of the text. Again, we can look to TeX for inspiration - the algorithms TeX implements for hyphenation and justification are considered the gold standard in the publishing industry. Indeed, it is only with the release of InDesign that proprietary page layout systems gained the paragraph-at-a-time line breaking optimization pioneered by TeX.

Specifying all this is a pretty ambitious goal. It's almost certainly beyond what the W3C is capable of. But I don't mind the idea of it taking a long time if the result is to finally get the word processor right. Having a really good structured editor is a major step in the right direction, but not the whole solution.

Maybe..., posted 7 Dec 1999 at 23:47 UTC by jdube »

Maybe, but I think that WYSIWYG is here to stay. Just as some people still use frontpage, people will use these text editors. Personally, for school papers etc I stick with Corel Word for Linux (please don't hurt me) but for all else I use vim / gvim (not trying to start an editor war BTW...) which I find are small, fast, and very good. I belive that like HTML, Linux, and pretty much anything else that anyone with any computer experience whatsoever would be easy is not easy for the masses. People come to me every day demanding I help them step by step through saving a file (click on the icon) or how o install a file (click on the icon) or open a file (click on the icon) or run IE (click on the icon) or... you get the point. Same people, same questions, all day every day. People that can't grasp "click on the icon" pobably can't grasp something harder than what they already don't understand.

Learning and usage, posted 9 Dec 1999 at 06:22 UTC by Crimz »

Tools obviously don't replace each other in an all-encompassing and instantaneous way. The use of an old tool approaches "none" asymptotically, at a rate depending on how many and how good other existing tools are and the degree of functionality overlap they have with it. Simplicity is one of the overlap factors - cases in point: joe and Notepad. Also, some people are still using WordPerfect for DOS, or old Word versions.

The usefulness of a tool depends on the job it's supposed to do. Creating a tool that does a better job than existing tools in both physical layout and conceptual structuring of data is rather ambitious - especially when you take into account the comparative user-end simplicity of existing tools that specialize in only one of the fields.

So, I think such a unifying tool would have its primary use in a third field. It wouldn't excel when applied to presentation-only (DTP) or conceptualization-only (database) tasks, but rather when applied to mixes of the two.

Make no mistake, this is a huge field. The internal information flow in workplaces and project groups, for instance, is vast. And it's ideally suited. They require layout (high level of human-readability) as well as structuring (high level of computer-readability). Structuring is currently a hard task if you're not into markup languages, and XML is still a very technical subject. Thus, most employers stick with Word for everything from internal memos to press releases. They lose all structure, and the layout is still minimal. Structure is important for keeping a "shared associative memory" for a company or an organization.

About the layout engine: I think these can be done very simple, or very advanced. The important point is providing the tools for creating such engines. I'd say that if it was done right, you could do anything from simple memo to complete newspaper layout, provided you have a rich enough data structure. For one-time layouts, it could even introduce elements of randomness, as that's pleasing to a large section of the population. Obviously, a sane backend will require some plumbing however you choose to look at it.

About the frontend: The learning curve for this could actually be less steep than that of existing WYSIWYG editors, as you have "what you think is what you get" (conceptual) markup instead of clumsy "what you see is the result of your lack of comprehension of the Word bulleted list layout internals." People will have to think about what they want to express, instead of how they want it to look. This is a good thing, as many content providers don't know anything about good layout, and wouldn't want to bother with it anyway.

That about sums it up - I've answered the usability question and the applicability question, as well as providing some general thoughts. This might seem like random rambling to you, but I assure you, it made sense to me at the time.

Structured data and the death of WYSIWYG

Posted 5 Dec 1999 at 05:36 UTC by Radagast

Documentors want one, posted 5 Dec 1999 at 10:16 UTC by Acapnotic » (Master)

lyx isn't that bad, posted 6 Dec 1999 at 10:41 UTC by notzed » (Master)

The question is where to put the focus, really., posted 6 Dec 1999 at 13:24 UTC by Radagast » (Journeyer)

Reports of WYSIWYG's death greatly exaggerated, posted 7 Dec 1999 at 07:50 UTC by raph » (Master)

Maybe..., posted 7 Dec 1999 at 23:47 UTC by jdube » (Journeyer)

Learning and usage, posted 9 Dec 1999 at 06:22 UTC by Crimz » (Journeyer)