I'll start by responding to a few threads here, namely
simonstl on ASN.1, and sej on EPS/PDF.
ASN.1
simonstl recently
brought up ASN.1 as a possible
alternative to XML. It's worth looking at, if for no other
reason than
to be aware of history so you don't have to repeat it. The
advantages
are clear: it is much more compact than XML for
binary-oriented data,
and it is quite mature. However, the associated schema
language isn't
especially nice (schema language design is a very hard
problem), and
it carries quite a bit of historical baggage.
In particular, implementations have tended to be
horrible.
In the
olden days, most implementations were "compilers" that
autogenerated
C code to marshal and unmarshal ASN.1 data in and out of C
structures. This mostly-static approach loses a lot of
flexibility, of
course. Code quality is another issue. I am unaware of any
general
ASN.1 tool that approaches the quality of open source XML
tools such
as DV's excellent libxml (although it's possible things have
improved
since I last looked).
Another major piece of baggage is X.509, a truly horrid
data
format
for public key certificates using ASN.1 as the binary
syntax. Most
sane people who have encountered it run away screaming. See
Peter
Gutmann's x.509
style guide for a more detailed critique.
So it's worth looking at ASN.1, but it's certainly no
magic
bullet.
Incidentally, ASN.1 is one of many attempts to define
low-level binary
data formats for tree structured data. I hear good things
about
bxxp/beep, but haven't studied it carefully enough to
critique it. Off
the top of my head, there's sexp,
which is
the binary data format for SDSI (the name reflects its
kinship with
Lisp S-expressions). Further, most RPC mechanisms define
binary
encodings. Don't forget IIOP (Corba), XDR (Sun RPC), and so
on. I'm
sure all have advantages and disadvantages.
What exactly are the goals? As I see it, the main
problems
with XML
are bloat (for general data; XML bloat in marked up text is
quite
acceptable) and complexity (XML is only medium-bad here).
Binary
formats help a lot with the former, and may or may not with
the
latter. But there are a lot of other considerations,
including:
- Quality of implementations (XML wins big).
- Associated schema languages (messy; importance
varies).
- Suitability for dynamic mutation (XML's DOM is a
reasonable,
if inefficient, API for local tree manipulation; most other
formats
don't have anything similar, and their more rigid nature
would make it
harder).
- Scaling. XML itself doesn't scale particularly well
to
huge trees,
but is usually implemented in conjunction with URI's, which
do. A
binary data format can either help or hinder this goal.
Update: ASCII vs
Binary, by David Reed.
Transparency, PostScript, PDF
sej brings up the desire to
handle vector objects
with
transparency. Basically, PostScript can't do it, and
probably never
will. PDF 1.4, on the other hand, has very rich transparency
capabilities. It's a bit more complex, and caution is needed
when
dealing with a standard controlled by a highly proprietary
organization such as Adobe, but it has compelling technical
advantages.
Here are two signs that the rest of the world is moving
to
PDF:
pdflatex appears to be far more vital than dvips. Mac OS X
uses PDF
for the imaging and printing metafile.
ETCon
I went to the O'Reilly Emerging Technology Conference
(formerly known
as the P2P conference) on Wednesday. It was great fun. I met
a lot of
people, some of whom I've known for years (such as Jim
McCoy). Others
I've known online, but met for the first time in person
(Kevin Burton,
Aaron Swartz, Dave Winer).
I particularly enjoyed hanging out with Jim, Roger
Dingledine, Bram
Cohen, Zooko, and Wes Felter. It is such a luxury to be able
to
interact with people on such a deep level. In my opinion,
the best
people to associate with are those you can teach and learn
from. Thanks all of you for making my Wednesday a day of
intense
teaching and learning.
A lot of people have heard of my work, and are reading
my
thesis. I
think Roger is largely to blame for this; a number of people
mentioned
hearing about it from him. I had great conversations with
the
MusicBrainz and nocatauth folks, both of which would be
killer
applications for attack-resistance.
I have this uneasy feeling there's something important
that I left out. Also, one risk of naming names is that
people might infer something from a name being left out. Not
to worry - I genuinely enjoyed meeting everybody I met
at ETCon.
McCusker
Reading David McCusker's log fills me with sadness. The
personal
troubles he's going through resonate deeply for me, as I've
been
through similar things. I'm in a much happier space
now, for
which I consider myself very fortunate.
His technical work is also quite compelling. When we
meet,
one of the
main things I want to talk about is whether to join forces
on some of
our pie-in-the-sky dream projects. Collaboration can be a
nightmare
when the other person doesn't understand what you're trying
to do, or
particularly share your goals. I get the feeling that
working with him
might be otherwise.
David talks about being busy and not having much time
for
things. I
can certainly relate to that. He does, of course, have time
to meet
with me, whether he realizes it or not. Other things will of
course
slide, so it's worth considering their importance in the
long run.