Recent blog entries for dchud

Zarg! Fittle, z00ty wx lmplinky zot zarg.

Never thought much about making diary entries a regular thing, but the new setup enables less direct interaction with other people so maybe it's a good time to post a daily run-on or two here.

First things first. I've left the job at Yale, effective last week. Spent most of last week catching up on sleep, moving email archives around, leaving/joining lists from old/new accounts, and generally letting the transition take hold psychically. I don't really have much time to spare but in retrospect it was a good idea to allow some buffer time.

Here's the plan: I'm forming a non-profit corporation. Have the board and a lawyer and everything. The broad goal I want to work toward is making the net work much more like a big library. The purpose of the organization will be to seed/support a handful of projects which provide free pieces of that big global library infrastructure, starting with jake. The general shape of a project the company will take on is anything which enables use of functional metadata. By functional, I mean the explicit reorganization of (usually) biblbiographic information into structures which can be generically useful in an unbounded range of software or publishing projects. In support of this projects will have an information gathering component, a collective data diff/patch structure (ie open source data maintenance), and a collection of well-defined APIs and free code libraries for access.

I can explain this a bit further in the context of the jake project. The data in jake exists elsewhere... in MARC/AACR2 catalog records, in Ulrich's International Directory of Periodicals, in proprietary content services. But nowhere is this information (which is largely factual and therefore arguably public domain) either architected for modular use in a wide range of applications or freely available under an open source-style license. Basically MARC/AACR2 is difficult to hack because it lives in hard-to-hack access systems (Z39.50 doesn't scale well in today's implementations), its content rules are often implicit syntax and not explicitly tagged, and its useful metadata components (such as fields for ISSNs, ISBNs, and the like) often reference external naming systems whose content are only accessible under license (and usually not any more hackable).

For jake we're removing each of those problems by putting the information most generically useful for hacking journal access systems into a generic data structure with obvious hooks for other applications. We reference external identifiers but generate our own internally. And even though the project's only halfway to 1.0 there are people using it in ways we never predicted.

So that's the general idea. Libraries need to expose their data to the hacker community better. Hackers need to understand that much work of librarianship, such as authority control in cooperative cataloging, are absolutely vital pieces of the puzzle. By seeding a few projects that demonstrate this to both communities hopefully the company will define a niche area where immediate collaboration is necessary.

Right now I'm setting up a dedicated jake site and migrating it out of Yale. It's going fairly well but there's a lot to deal with, including a site redesign, moving the data, rewriting code to build a cleaner query environment, and timing support requests to the very gracious provider so's not to interfere with their own internal hardware upgrade. Hopefully we can shoot for the end of the month for the new site and an 0.6 release.

So I'm working at home on this stuff, along with putting together paperwork for the company. It's funny deciding which lists to subscribe to at this point. Because I don't work in a library anymore I think a lot of the lists I used to follow aren't really germane to someone whose work now revolves around thinking of the net as one big library. :)

Hmm. Feels good to blabber on here.

30 Nov 2000 (updated 30 Nov 2000 at 14:54 UTC) »

un wee experiment: to be flamed by crackmonkey?

Wondering why the heck he'd be wondering why the heck there aren't already decent personal library systems. Found the monkeymonk thread on MARC and Z39.50, read the archives, see ubersubscriber addresses in all their glory. Some of 'em know me for one reason or another, including one i worked with a while back. Clearly tho there's some secret entry, right, like smailing tcp packets in hex format written out onto denny's placemats wrapped in dot matrix blue-n-white lined 120 pinfeed with carbons to some postal address that looks like an mx record.

Which means, if i get this whole thing, that my "but golly i'm not using a win machine so please let me on via the mailman interface" approach is sure to get me happily bounced. The anticipation is killin. :)

(next day) tee-hee, that didn't hurt too much.

jake is now at 0.5.2. It's definitely at the halfway point: its design does the job and people are using it, but the most important pieces of making it sustainable remain to be done. Fortunately there are several people stepping up to work on these, so the future looks good.

Friday's a big day... we get to take the jake roadshow to Harvard to show it off for folks there and from MIT. Must add snazzy new features... in the meantime I guess you know you've got a good project going when executive directors and presidents of thisdotcom and thatdotedu start sending you email about it.

Our apartment move is done but there are still boxes galore. DSL is ordered but won't be up for three weeks. They're giving me free hardware inc. an extra nic though so I can't complain I guess.

Today was the latest in a long series of crazy days, hopefully the last for a while with the holiday this week. First thing in the morning the Dr. tells me I might have an ulcer. Sheesh.

Things definitely picked up after that, at least. Since I posted the docster article late Friday folks finally started getting around to reading it today (librarians don't work weekends). It generated a solid discussion on oss4lib-list but a lot of the folks just don't seem to see the distributed model yet. I would let it slide but they can't use the old "Metallica sued us so we can't even ping napster.com" excuse like we can here... anyway there was enough positive feedback to make me think we could tweak gnutella a bit and get a trial going with a few libraries soon enough.

Highlight of the day was hearing John Simpson, chiefe editor of the OED speak at Yale about goings on with the major revision. Evidently all they've done to change their submission model is to take the slip format they used to use and make a web form based on it. I got a chance to speak with him and another editor about creating an "appeal to hackers" a la their 100+-year-old "appeal to readers", asking for help creating automated tools for managing 100s of 1000s of submissions and such. Fortunately the second person he directed me to groks some perl and thinks it's a good idea worth exploring. He's based in CT too so I'm going to try to buy him lunch when I get back from the holiday.

Second highlight was my very own girlfriend making off with the snazzy event poster. M. if you ever see this I'll make it up to you with homegrown mojitos. ;) It'll look good in our new place.

Played with zope for a while, considering building a jake version running on zope. It would be good to see what sort of distribution and maintenance efficiencies might come about from leveraging some of the available xml services in there.

Made a condolence call tonight. Long crazy day indeed. Tomorrow we're off for Detroit in a rentacar.

Moved oss4lib over to www.oss4lib.org. No more "Yale won't be held responsible" disclaimer. It's about time, too. I wonder if it's a good idea to pull the listserv off of the .edu machine also and use one of the sf lists.

Put through requests for irreference.org but no shell yet. Registered unalog at sourceforge but haven't ported cvs over yet. Plenty of work to do, but if I want to catch today's puzzle on npr the alarm will need to be set for three hours from now. Something's gotta give... off to bed.

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!