24 Feb 2001 RyanMuldoon   » (Journeyer)

I took a look at www.canonicaltomes.org - it is a very cool idea. It reminded me of a project I wanted to do 3 or 4 years ago, but I still have yet to get off the ground, or see anyone else really do. The project would be a central compilation of every public domain work that has been digitized. The goal would be to provide a nice, navigable, searchable interface to all the extremely useful research materials out there.

It would have to have the following features:

  • A Yahoo-like category interface for browsing casually
  • Each work would have extensive metadata, covering a standard like Dublin Core
  • A search interface that lets you perform all the basic searches, but also searches by metadata, so you can dynmically regroup works to your liking
  • A nice gdict-like desktop application that is a search gateway
  • Palm/WAP interface?
  • Cross-referencing
  • Text prettification, so you don't get stuck with ascii if you don't want to....some nice HTML or XML with stylesheets would be nice

I'd imagine that the technical side would be the easy part. It would basically just have to be a big database with a well thought-out schema. The hard part is definitely organizing the content, attaching the metadata, and finding it all. Also, it would be good to be mirrored. Eventually it should be able to act as part of a distributed filesharing system. It would be an invaluable research tool.

With things like GNUpedia, and other similar efforts to create free-license encyclopedias, it seems like a much more worthwhile effort is to work on something like I describe above. An encyclopedia is only useful after there is a collection of works to reference. This would probably go further to accomplish what RMS wanted to get done: there is already no copyright on this material, so no competing interest can do anything about it. Once there is a community around it, it can be extended in all sorts of directions.

Of course, I think that the Library of Congress should provide such a resource, but the person running it seems to disagree with me. Ah well. Maybe one day when I have free time I'll try and get something like this started.

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!