17 Jul 2002 guerby   » (Master)

I'm looking for a free software solution to index a big (CD-sized) collection of HTML documents (articles from a monthly news publication over a few years). The plan is to have pregenerated static indexes and all documents in plain HTML (should be usable everywhere) and then to offer some additional software for word / date / etc boolean queries and query result management. The software must work on Linux, Win9x and above and MacOS X. May be it's possible to develop / reuse / adapt a plugin for IE and Netscape.

If you have any link or idea on how to achieve this or better places to ask, please let me know either on your diary or by email at guerby@acm.org

If of interest, I'll post a front page article. Thanks for any help!

Latest blog entries     Older blog entries

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!