Older blog entries for cinamod (starting at number 8)

While Raph has a few good points in his most recent blog, I can't help but be more pragmatic.

When we examine real-world use cases with regard to handling MSWord documents (most importantly in places where MSWord doesn't exist), I've come to the following 3 use cases:

1) Viewing: User wants to view the content of the document. Formatting doesn't have to be lossless, as the formatting is generally grossly secondary to content. An intermediate format such as HTML, TeX, or plaintext will serve well here, as they pass the "close enough" test.

2) Editing: User wants not only to view the document (ideally in a lossless fashion) and be able to remove and append content as necessary. The intermediate format would have to be losslessly interchangable with MSWord, and the layout engine would have to produce (nearly) identical results as MSWord (whose layout engines aren't even truly compatible between version releases). Here, you want a tool such as AbiWord, OpenOffice, KWord, ... These tools all have the ability to do WYSIWYG printing, and are thus able to produce PDF and PS versions of their contents. Note that these tools aren't perfect yet, but do a good job and can be readily improved upon.

Raph conjectures that it might be useful to have a third option:

3) View losslessly. Layout engine would have to produce identical results to MSWord (see above #2 caveat) and produce an image of the document. PDF, PS, PNG all are good output formats for this use case.

Now, as far as most end-users (secretary, student) are generally concerned, the third use case hardly (if ever) exists or applies. However, in a server setup, it might be useful for #3 to exist for things like batch processing and archival. So let's keep it in for now.

Now note that #3 is a strict subset of #2's capabilities once you have a '--print' command line argument or similar interface. Raph concedes this. Now, if you believe cuenca's recent entries, you'll see that adding editing capabilities to a PIECE TABLE isn't too difficult on top of a static model. To boot, several static and dynamic piece tables have already been written and are freely available. The layout engine's *sole* purpose is only to render the contents of the piece table. The GUI uses a View to alter the piece table. Dynamic content is now not a factor whatsoever in the argument. We have shown that in theory AbiWord's (or OO's, ...) piece table and formatting engine should be able to be isolated from the program, and used in such a batch formatting engine, like one that Raph proposes.

Now, true, the advice Raph gives is sound - the best test for any new layout engine would be to use it + your piece table + graphics class as its own batch renderer. While this is useful for testing, development, and debugging purposes, I think that it's worth the little bit of extra effort to slap a MVC architecture on top of it and make it editable via a GUI. IMO, it'd be a shame if you didn't.

IMO, creating your own renderer now would be useful for novelty alone, as there are already several quality existing ones. Not that there's anything wrong with novelty. I wouldn't expect to get it right anytime during this lifetime, though. I'd personally rather have you working on Abi or OO. But it is your time to spend as you please. In my ideal world, #3 only exists as a step to #2, and #2 already exists in several incarnations. YMMV.

Finally, using The Gimp vs. ImageMagick is, in my opinion, a poor and flawed analogy. Tools like IM (as described here) turn an image to a RGBA buffer. No layout-like operations are necessary during this conversion. The Gimp's operations place more RGBA paint on top of an existing RGBA canvas at specific (X,Y) positions. Again, no layout-like operations need to happen here. While it is useful to first have the "gimp image library" be able to decode the images into RGBA buffers first, the levels of complexity regarding what needs to happen with MSWord aren't even *near* comparable. They're apples and lawnmowers.

Does it make sense for #3 to exist? Sure, I guess. Does #3 already exist? I believe so, via #2. Should you feel compelled to write your own #3 instead of improving the layout engines in #2? Well, that's for you to decide. I'd strongly advise against it. But that's probably just my pragmatism, experience, and desire to have more people on my own projects speaking. IMO, there's isn't the diversity necessary within the OSS community to support multiple #3's (except maybe vanity). But that can be said of a lot of OSS projects nowadays.

Raph - it has been said that all great men have stood on the shoulders of giants. And for this, I sincerely thank you.

As for your MSWord batch-formatting system, well, I have some experience there between my work on wvWare and AbiWord. In order to do something like DOC->PDF conversion you'd necessarily need a layout engine and some sort of graphics class, like the PDF solution you describe. Re-inventing yet another layout/formatting engine, is, in my opinion, not really worthwhile outside of academia. Things like AbiWord and LaTeX have layout engines and graphics classes already. Using these sorts of tools, you can already do DOC->PS/PDF without of the "klunkiness" that a GUI might impose.

AbiWord --print=file.ps file.doc

wvLaTeX file.tex file.doc

You can then use 3rd-party tools to convert to PDF if you'd really like.

Basically, in my opinion, time would be best spent on improving wvWare or wv2 and the AbiWord layout engine. I believe that getting 100% correct layout is not only possible, but likely.

As you may have read on Slashdot (much to my chagrin since I didn't want this posted to /.), the AbiWord tip jar was recently robbed. The tip jar happened to be run via PayPal.

I've gotten a slashdottings worth of emails on the subject, and I'm thinking that a lot of people are missing my point - I'm not directly blaming PayPal for my being robbed, though I entertain the possibility that the problem might be on their end. I'm blaming PayPal for being insensitive and irresponsive to a customer's needs, and for not living up to the terms and conditions of their membership agreement.

PayPal charges a fee in exchange for providing a service. Several terms of said service are not being lived up to, and in my estimation, PayPal is not acting in good faith.

If PayPal had merely responded saying "We're investigating this charge" *EVEN* if they came back saying that my charge had no merit, I would not have sent this email. I refer you to these quotes from paypal's own site:

"PayPal will investigate your complaint and attempt to recover any funds you are owed. You will be entitled to the return of any funds PayPal is able to collect on your behalf. However, fund recovery is not guaranteed."

See also: https://www.paypal.com/cgi-bin/webscr?cmd=p/gen/terms#insurance and https://www.paypal.com/cgi-bin/webscr?cmd=p/gen/terms#consumer_protection

I'm not a demanding person. I don't necessarily want my money back, though that would be the most ideal resolution to this problem. What I want is for PayPal/EBay to send me a fscking email saying "we're investigating your complaint." That's all. Even better, they could find out the address where the camera was sent and give that to me, or contact some authority aobut the problem. Or at least give me the ability to do so myself without having to get a subpeona against them.

Dom

Hey,

It's been a while since my last entry...

Life's been rough lately. It's been a really confusing ride.

I recently graduated college and starting working full- time. The place I work for is great. The people there are wonderful. I just wish that some of them were closer to my age. The next closest person (age-wise) is 11 years older than me. There are a lot of interesting, smart people that work there. The problem is that most of them work remotely from places like New Hampshire, so their years of wisdom and experience go lost on me. No "water-cooler" meetings to speak of really. Herb Brooks would be disappointed. I'm not a solitary hacker...

After graduation, my friends are scattered across the country. Some are still in the greater Philadelphia region, or at least are for the next few months (I say "greater" because Philly is a really large place, so it could take hours to get to these friend's places). To boot, my girlfriend of 3 years, whom I love dearly, has been visiting Europe for a month. It's tough, considering we've never really been apart for more than a few days these past few years (and we were never apart for more than a few seconds when we were both in school ;-)

I'm having trouble finding motivation right now. But I guess I have a thing or two to look forward to.

Today was a happy day - the last day of classes of my senior year. I'm glad to be (almost) done with school, at least for a while. I know that I'll definitely miss it and a lot of the people I've met here @ UPenn.

I'm hacking on some random AbiWord stuff right now. We should have template support in by tomorrow, which will be neat. I'm trying to squeeze in as many quality/expected features into Abi as is humanly possible. So I guess that I'll start on a new imaging framework tomorrow too.

Speaking of images, my GdkPixbuf patches got committed upstream, and Federico released a new version (0.11) in honor of them. Nothing big, just C++ compilation fixes.

I was working on trying to represent vector images as raster images in GdkPixbuf (WMF and SVG) but got a bit dismayed because this problem is *hard*. I'm sure that I'll start up on this again because at least ImageMagick has support for this. Having support in GdkPixbuf for this will allow Gnumeric and Abi to handle embedded WMFs rather trivially (instead of writing our own handlers using libwmf2 or some such lib). Apparently KDE has 3 WMF-handling programs. Apparently, they all suck and are API-incompatible. Let's one-up them #:^) Well, ok, _maybe_ I'm a *bit* biased, seeing as how I'm a Gnome developer *and* a libwmf2 contributor, but that's just because they're better #:^)

/me ducks the flames

Dom

Wow, so apparently a lot of people read that bit about Europeans smoking. Seems that most people don't know how much I tend to exaggerate about certain things :-)

Lots of flames in the AbiWord camp - me against the other 7 or so active developers. But I think that we came to a resolution that everyone can live with an be fairly happy with.

The good news is that development on AbiWord as a whole is going at a furious pace these past few months. Nothing pleases me more. Unfortunately, feature-parity between the platforms is starting to drift apart. On some levels, this makes me happy - we're using native controls to incorporate as well as possible with the native desktop environment. Of course this means that you can't load a JPEG on windows, but since we're sticking to some standards (like only storing raster images as PNGs) we're going to be fine XP, which makes me happy.

19 Apr 2001 (updated 19 Apr 2001 at 05:24 UTC) »

Well, it's been a while. Well, to start off it's my 22nd birthday and I'm writing papers for my Dutch, German, CSE, and Philosophy classes - which really sucks.

I'm totally swamped with schoolwork and OSS coding stuff. Moreso on the schoolwork as I can choose to ignore coding for a while.

I'm procrastinating right now instead of writing papers, which is so much like me. It's sad, because I'm actually a quite good writer. I'd love to write O'Reilly books someday for a supplementary income.

I've been to GUADEC in Copenhagen recently. That was a lot of fun. I'll have to post links to pics there. I find it horribly amusing that more pictures were taken of the part-time windows guy who hacks AbiWord than of me, its maintainer who gave 2 speeches at GUADEC. Maybe I'm not photogenic enough. Oh well :)

From this conference, I've come to the conclusion that every European smokes, which is a shame. Everyone is concerned about not polluting the environment (which is to be commended), but they think nothing of polluting their lungs with carcinogens. It dissapoints me, to say the least.

So this whole GNOME integration with Abi thing is getting better. I did a cool GdkPixbuf hack today, patch off to Federico. Need to make a SVG loader for that too. Splat - it's on my TODO list.

Abi now has cool multi-lingual spell-checking which you can fine-tune down to a per-word level. Quite cool. Need to hack in opendict/gdict support sometime too because that'd be cool beyond belief.

Anyway, got to get back to my papers.

Today's been pretty long and busy, but some basketball in another few hours should make it all better.

I've started up re-engineering (read: designing, not coding) AbiWord's layout backend. It should be quite an endeavor. I expect it to be done and have a good bit of it coded in a few months probably.

What'll this accomplish? Well, first off everything will be much cleaner and faster if things go according to plan. Most importantly, it'll enable us to be a lot more modular and flexible. New layout primitives will be easy to define, implement, and integrate into AbiWord. For all you user-types out there, that has one immediate implication: tables.

Well, I'm optimistic and not tired for once, so I better start writing some ideas down.

Well, today I signed up for an Advogato account, so I think I'll write a bit about myself to start off. I'm 21 years old and hail from Philadelphia, PA. I'm currently a senior at the University of Pennsylvania studying Computer Science and Germanic Languages & Literature. Those German authors are a little bit "off their rockers" IMHO.

So, anyways, I like to work on lots of different free software projects. I've been involved with Gnome for a while now. I have code in a bunch of their modules, but mainly I've worked on Office-related software and its supporting libraries.

Right now my most serious projects are AbiWord and wvWare.

Anyway, enuff for today. We'll see how current I keep this :)

Dom

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!