19 Feb 2008 caolan   » (Master)

ms binary docs

Since the ms binary docs documentation release I’m gotten a veritable blizzard of mails about it.

The move is welcome of course, efforts to improve interoperability are always appreciated. So thanks.

But the original ‘97 formats were released on MSDN around 97/98 and were available on the MSDN website for some months around then, so this isn’t as totally new as people seem to think. MS has done this once before. (I wrote ivt2html a stack of years ago just to read the .ivt format of that era). Though clearly the documentation for 2007 will include the changes since 97, but the changes to the format are compatible ones and over the years OOo has already figured out pretty much all of the relevant additions. So the release of the formats is likely to help other projects that want to start from scratch much more than it helps a project like OOo that has pretty much already figured out the majority of what I assume (I don’t do much .doc import/export anymore) is in the documentation.

The remaining major issues for compatibility IMO for .doc/.odt at least break down into 3 categories

  1. Misunderstanding of the format, that’s the smallest issue by far, maybe some table in table glitches fall into this category
  2. Disjoint feature sets, e.g. there are some constructs in writer that don’t exist in word which are problematic to export to .doc, i.e. all of the writer page styles possibilities are not expressible in the word section system, and vice-versa there are word features that don’t have mapping in writer, i.e. highlighting as a separate setting than text foreground color, though that’s obviously easier to fix
  3. weirdness, i.e. the layout algorithm rules to determine what to do when faced with layout constraints that cannot be met, e.g. roughly a circular dependency where e.g. a graphic anchored to a paragraph affects its own layout in some nasty feedback way, especially in multi-column documents. As as example of an oddity that’s taken care of by the .doc import/export filters the top left corner of a graphic in writer is position x,y of its properties and the border (if any) is drawn inside that extent while in word the border is drawn outside it, i.e. the top left corner of a bordered graphic’s exent in word is x-borderwidth,y-borderheight, unless it is one of a small class of banded borders where only the first stripe or two is positioned outside the unbordered graphics extent, and the other stripes are inside it. i.e. when importing or exporting it you have to know what type of border is being applied and fudge the figures to get the same size and position of the total entity to get the same positioning as in the other application.

Syndicated 2008-02-19 14:13:21 from Caolan McNamara

Latest blog entries     Older blog entries

New Advogato Features

FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!