Older blog entries for jeffalo (starting at number 8)

My web publishing adventure has been placed on the back burner while I work on normalizing a database from another project. It's a lot of hard work, especially since there's only a partial data dictionary and many of the constraints are handled by VB code.

I'm not a VB coder, but I do have an outline of the object model which, of course, looks only remotely like the relational model. Anyhow, two days into is so far and I've got most of it hashed out. Call it Iteration One. I suspect there's some many-to-many's lurking, but most of the duplicates will be outted, at least.

In other news, I found out to my chagrin that Xerces lacks the ability to build DOM structures from SAX events. That's not entirely true: I believe it constructs DOMs using SAX, but you can't get hold of the ContentHandler instance. That makes it hard to throw filters in front of it, doesn't it?

Had to go to DOM4J to get a DOMBuilder that implements ContentHandler. So far I've got a bit of GNU, a bit of DOM4J, and a bit of Xerces all rolled together to handle XML documents. Seems unlikely to remain version-stable, but it's the best I can do for now.

Later

1 Nov 2002 (updated 1 Nov 2002 at 01:34 UTC) »

Well, I'm undergoing the total immersion method of learning web publishing. This means grappling with Tomcat, Cocoon, XSL-FO, XSLT, XSP, and FOP all at once.

Our team has the task of coming up with an editor for a large industry-standard XML data exchange file. The conservative approach would be to deploy it as a Java application. Since it needs to support graphic rendering and manipulation that may ultimately be the way to go, although SVG could perhaps cover that requirement. I guess I'll add that to the heap-to-learn, too.

Not a lot of time to learn it in, either. It would help if Cocoon had better documentation. I managed to get something published, although more by guess and by golly than by actually understanding what I was doing.

Simultaneously trying to learn PHP to manage my new personal website. Frustrated right now because I can't get a simple redirect to work using the header() function. Tried a META element and no luck there, either. Hmmm, I'm running out of tricks. Or treats.

Happy Halloween!

 
 /\  /\
   /\

^^^^^^

Castor article published on the ONJava website.

I forgot to ask for money. :-(
However, it felt good just to have something published after my book on XML Schema was cancelled at virtually the last possible moment. grrr...

But I'm over that now. Somewhat.

Other things:

Still(!) working a pseudo-universal namespace transform (PUNT) for XML namespace prefixes. I've got the encoding bit working (except for prefixes in content, but that's easy). Found a serious bug in Xerces 2.2.0, where it was dropping the uri values on the endElement() callback. Version 2.0.2 doesn't have this problem.

Decoding a PUNTed prefix is just a question of pattern matching through regex and replacing the binary-encoded nonName characters with the original URI characters. And, of course, add any missing namespace prefix declarations (which is the whole point of PUNT).

No, a PUNTed document is really not XML, howevever it is XML conformant. There's a difference. XML documents should above all be readable, although we're well past the point of that actually being the case in practice, even within the W3C techs. Just look at XSLT :-)

But there are a number of problems with Namespace prefixes, and PUNT is a brute-force approach to tightly binding URIs with their local element and attribute names. Really ugly, but hard to screw up. Which is the point.

---

Also thinking more on a typed metalanguage. Jeni Tennison and Uche Ogbuchi are exchanging ideas on a very interesting thread on xml-dev right now. Ideas worth stea^H^H^H^H borrowing.

---

PUNT has given me the opportunity to play around with XML filters in SAX. I haven't read a really good article on how SAX XML filters work and what they're good for, though.

Hmm...

8 Aug 2002 (updated 8 Aug 2002 at 19:35 UTC) »

Just a few quick updates:

1) Looks like the Castor article is going to a webzine rather than the website. I'll let you know...

2) Made some progress on PUNT, which is a set of filters for converting Namespace prefixes in XML to encoded URIs. Results in real ugly documents, but it may be a useful tool for those still operating in namespace-unaware environs (or where namespaces are just handled badly by some third-party joker who won't listen to you).

3) Thinking more about a typed metalanguage (TML) that is backwards compatible with XML. I've had a couple of interesting debates with people on xml-dev of late.

For instance, Simon St. Laurent's a big fan of loose constraints, lexically specified, whereas I'm more the semanticly-undestood datatypes advocate. The twain have to meet somewhere, though. Further exploration of the TML concept should help delineate those boundaries. The XML Schema datatypes guys have pointed the way, although they got lost somewhere on the trail, probably attributable to politics and compromise (and deadline pressure).

There's something to be said for being a cowboy.

Later

Okay, creative differences are getting sorted out. More fun on the way.

Well, Castor and I are having creative differences about what kind of documentation they need. Still, they're adding new docs, which is good. Any new information on their site regarding SQL binding is better than what's there now.

In the meantime, I'll finish up what I started and get it out in view somehow. Hmmm, if I could just get enough people to cert me here...

Still working on the SQL Binding docs, although I haven't touched them in the last two weeks. Trying to get an article accepted on applying dependency analysis in XML and "normalizing" the document structure.

Of course, this will raise the hairs on the back of the doc-heads, so I have to be careful that I'm talking about XML-formatted data records. I can't see why the rules for normalizing data records in XML would be any different than that for fixed-length data records or CSV records.

Maybe someone could clue me in to some fundamental difference, but it would be hard to convince me at this point. I've scrubbed too much bunged-up data in the past to believe that normalization isn't worthwhile.

Dear me:

Got started on the Castor SQL binding docs. Pretty much followed the first two sections of the XML binding doc as a template. Now I'm stuck for a single good example, although I have a rather lame one that will work.

Spent considerable time on Saturday experimenting with hiding the Castor-genned classes (from an XML Schema source) behind a facade of objects inheriting from them. I'm not sure how viable the idea is, but two Castor features would be necessary to even try to make it work:

  1. null extension mappings (class extending another, but having no fields and using the same table as its base class mapping)
  2. the generation of interfaces from XML Schemas in addition to classes implementing them

Oh well... now I'll have to generate classes when the schema changes and hand merge them with changes already made to existing generated classes. Pain in the butt, but long term that's probably how it would end up, anyway.

Maybe a good beans editor could replace the XML Schema code generator so that I don't have to write those annoying accessor/mutator methods.

Ciao

Dear me:

My first post here. Let's see if the habit sticks...

Current projects:

Write Castor JDO for SQL document. Some of this, at least, will wind up on the Castor website as they have a sore lack of documentation in this area. It cost me a week of pain to learn how to make the JDO mapping work; I hope to reduce that for others to about a day.

Need to get started on an XML Namespace filter that replaces namespace prefixes in an XML document with a Name- character encoded namespace identifier. The target document will still be a well-formed XML document, but the effect will be to replace QNames with encoded universal names (UNames). I have a feeling that some will see this as abuse of XML (encapsulating information w/o use of markup), but I'll leave it up to the practitioners to weigh the pros and cons for themselves.

Free advice: Listen to the experts, but don't let them decide for you.

Other immediate plans: Write XML Schema macros and plugins for Arachnophilia. The result will be a passable free XML Schema editor.

Try to find something useful to do with XSLT. So far my work just hasn't required it, which probably means I'm not looking hard enough.

Current ICC rating: 1808

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!