17 May 2002 raph   » (Master)

I'll start by responding to a few threads here, namely simonstl on ASN.1, and sej on EPS/PDF.

ASN.1

simonstl recently brought up ASN.1 as a possible alternative to XML. It's worth looking at, if for no other reason than to be aware of history so you don't have to repeat it. The advantages are clear: it is much more compact than XML for binary-oriented data, and it is quite mature. However, the associated schema language isn't especially nice (schema language design is a very hard problem), and it carries quite a bit of historical baggage.

In particular, implementations have tended to be horrible. In the olden days, most implementations were "compilers" that autogenerated C code to marshal and unmarshal ASN.1 data in and out of C structures. This mostly-static approach loses a lot of flexibility, of course. Code quality is another issue. I am unaware of any general ASN.1 tool that approaches the quality of open source XML tools such as DV's excellent libxml (although it's possible things have improved since I last looked).

Another major piece of baggage is X.509, a truly horrid data format for public key certificates using ASN.1 as the binary syntax. Most sane people who have encountered it run away screaming. See Peter Gutmann's x.509 style guide for a more detailed critique.

So it's worth looking at ASN.1, but it's certainly no magic bullet.

Incidentally, ASN.1 is one of many attempts to define low-level binary data formats for tree structured data. I hear good things about bxxp/beep, but haven't studied it carefully enough to critique it. Off the top of my head, there's sexp, which is the binary data format for SDSI (the name reflects its kinship with Lisp S-expressions). Further, most RPC mechanisms define binary encodings. Don't forget IIOP (Corba), XDR (Sun RPC), and so on. I'm sure all have advantages and disadvantages.

What exactly are the goals? As I see it, the main problems with XML are bloat (for general data; XML bloat in marked up text is quite acceptable) and complexity (XML is only medium-bad here). Binary formats help a lot with the former, and may or may not with the latter. But there are a lot of other considerations, including:

  • Quality of implementations (XML wins big).

  • Associated schema languages (messy; importance varies).

  • Suitability for dynamic mutation (XML's DOM is a reasonable, if inefficient, API for local tree manipulation; most other formats don't have anything similar, and their more rigid nature would make it harder).

  • Scaling. XML itself doesn't scale particularly well to huge trees, but is usually implemented in conjunction with URI's, which do. A binary data format can either help or hinder this goal.

Update: ASCII vs Binary, by David Reed.

Transparency, PostScript, PDF

sej brings up the desire to handle vector objects with transparency. Basically, PostScript can't do it, and probably never will. PDF 1.4, on the other hand, has very rich transparency capabilities. It's a bit more complex, and caution is needed when dealing with a standard controlled by a highly proprietary organization such as Adobe, but it has compelling technical advantages.

Here are two signs that the rest of the world is moving to PDF: pdflatex appears to be far more vital than dvips. Mac OS X uses PDF for the imaging and printing metafile.

ETCon

I went to the O'Reilly Emerging Technology Conference (formerly known as the P2P conference) on Wednesday. It was great fun. I met a lot of people, some of whom I've known for years (such as Jim McCoy). Others I've known online, but met for the first time in person (Kevin Burton, Aaron Swartz, Dave Winer).

I particularly enjoyed hanging out with Jim, Roger Dingledine, Bram Cohen, Zooko, and Wes Felter. It is such a luxury to be able to interact with people on such a deep level. In my opinion, the best people to associate with are those you can teach and learn from. Thanks all of you for making my Wednesday a day of intense teaching and learning.

A lot of people have heard of my work, and are reading my thesis. I think Roger is largely to blame for this; a number of people mentioned hearing about it from him. I had great conversations with the MusicBrainz and nocatauth folks, both of which would be killer applications for attack-resistance.

I have this uneasy feeling there's something important that I left out. Also, one risk of naming names is that people might infer something from a name being left out. Not to worry - I genuinely enjoyed meeting everybody I met at ETCon.

McCusker

Reading David McCusker's log fills me with sadness. The personal troubles he's going through resonate deeply for me, as I've been through similar things. I'm in a much happier space now, for which I consider myself very fortunate.

His technical work is also quite compelling. When we meet, one of the main things I want to talk about is whether to join forces on some of our pie-in-the-sky dream projects. Collaboration can be a nightmare when the other person doesn't understand what you're trying to do, or particularly share your goals. I get the feeling that working with him might be otherwise.

David talks about being busy and not having much time for things. I can certainly relate to that. He does, of course, have time to meet with me, whether he realizes it or not. Other things will of course slide, so it's worth considering their importance in the long run.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!