Ankh is currently certified at Master level.

Name: Liam Quin
Member since: 2000-02-18 03:07:39
Last Login: 2014-06-16 01:37:21

FOAF RDF Share This

Homepage: http://www.holoweb.net/~liam/

Notes:

Living (as a home owner) near Milford, Ontario, an SGML and XML Guru, text retrieval (lq-text), Unix and C programming since 1981 (urp!), Open Source and freeware since 1983 (well, that predates the FSF and GNU and the term Open Source, OK), IRC (Ankh usually), SGML since 1987, co-author of The XML Specification Guide (Wiley 1999), author of the Open Source XML Database Toolkit (Wiley, 2000), and one of three authors of Mastering XML Premium Edition (Sybex).

I currently work for W3C as XML Activity Lead.

Have also been involed in, or worked with, the X Window system, typography, DSSSL, XSLT, Scheme, C, Canadian font standards representative / advisor for ISO-related work, known as the barefoot programmer, what else should I say?

In spare time I scan old photos and engravings from antiquarian books, and put them on the Web together with extracts from the books.

trying to do a Gtk front end to lq-text, going for long walks in bare feet

You can email me as liam at holoweb.net if you like. Tell me what colour socks you're wearing.

Ankh certified:

  • Graydon, whom I knew when he worked for me
  • trance9, whom I've known for even longer than graydon, but who wears shoes more
  • zodiac, who is my brother
  • deus_x, who is writing an interesting content management system (EFnet/#Perl)
  • jwz and jef, who were both giving away X and graphics utilities long before Mosaic was born
  • milambar, whom I know from SorceryNet and have met
  • jivera, halcy0n and Mysidia, also from SorceryNet
  • some people as apprentice so they could post, or at their request

Projects

Articles Posted by Ankh

Recent blog entries by Ankh

Syndication: RSS 2.0
stan, Python already has regular expression support... if you want only ^.*$ then the simplest and most efficient way might be to prefix all others with \ and use the existing regexp support. Most implementations of Perl-style regular expression matching these days can use Boyer-Moore-style delta tables to go massively faster in many common cases. If the code was for your own understading, though, that's fine, and in any case Rob Pike rocks :-)

I spent some time with Marc Lehmann's String::Similarity module, which seems to do reasonably well on finding similar strings that were OCR'd independently. I wish Google would get a clue and make higher resolution scans: the OCR error rate would drop hugely, they'd get more of the punctuation and footnotes, and they might eve nstart capturing some of the diagrams! The problem is that it's more lucrative to have millions of badly scanned crap than to have hundreds of thousands of well-scanned books, it seems.

Been spending a lot of time working on a 200-year-old 32-volume dictionary of biography that I own (I got it in a second-hand bookshop in Oxford, missing two volumes that I later got elsewhere). I found several versions that had been OCRd really badly, and have been cleaning up one version enough that I can then try to use the other versions to detect errors.

The current version, converted first to XML and thence to HTML, is at words.fromoldbooks.org if anyone is interested. I'm hoping to be able to feed the cleaned up text back to Project Gutenberg and archive.org eventually, and to generate RDF.

Lots of interesting text processing challenges, so a useful diversion for a while.

Clearing the undo history of this image will gain 428.6 MB of memory.

Image editing is going much better with 8 Gigabytes of memory. I've been able to get three or four images done for FromOldBooks.org in the time it used to take to do one.

On the other hand, the only reason I get any images scanned and edited at all is because I get too tired to do much else; it's pretty insanely busy here.

Unfortunately, Google's ads almost entirely stopped working on my Web site (Google downgraded my pagerank from 8 to 4 a few months ago), and with the fall in the US dollar (it's been bushed), we're struggling a bit more than we'd like. OK, a lot more than we'd like.

Luckily, my spam says that I won the UK Microsoft email lottery, and the prize is either (1) all of Nigeria, or (2) more spam. Speaking of which, SpamAssassin seems to be working better after a one-line fix (I filed a bug for it). Or at least its not complaining as much.

So, today's image (no, I won't post them every day) is an ammeter from an 1892 book:

ammeter from 1892

24 Jul 2008 (updated 24 Jul 2008 at 22:30 UTC) »
Cats and Dogs

It's been the rainiest July on record here - and the month isn't over yet, of course. We discovered that the swimming pool can indeed fill above the top of its liner.

And during the storms, the dog, who is possessed by a daemon, becomes uncontrollable. or controllable only with difficulty.

I still miss being able to have time to concentrate, to focus enough to write reasonable amounts of code, to program. Working at W3C means I get to have a vague warm fuzzy feeling about helping the world a teeny bit, but it isn't always enough compensation.

In what little spare time I have, I scan pictures from old books. Soemone recently made a set of photoshop brushes from the 16th century demonic seals from the Goetia, and they two sets have each had over 900 downloads (they are here

and here if you are into such things). I have well over 2,000 images now, with sometimes fairly substantial extracts from the books, captions and other metadata. And there's an encyclopædia, some dictionaries of slang (including Brewer's Phrase and Fable), most of a vitriolic satirical political dictionary from the 1790s, and a bunch of other stuff.

Most of the text is in XML, so every now and then I update the XSLT that makes the HTML files and add smarts to find more cross-references. I want to do geotagging and links to maps, but this is harder than it sounds because the placenames I have are usually from when the books were published, not today.

Today's addition is some pictures of fonts, from a book I bought in Boston a couple of weeks ago, although these are not font samples as most people here would expect them to be, I suspect :-)

I did get to do some programming recently, though, and added some XML support to my ancient text retrieval package, lq-text. The changes aren't yet released, until I finish with some UTF-8 issues, but if you are interested, drop me a line. I wrote a short paper on it for the Balisage markup conference, too. I hope soon I'll use lq-text for the search function on my Web site, alongside the XQuery-based search that I have now.

Spending time on XML as character strings makes the world of RDF seem even further away, but I'm reading an interesting book on Ontology Matching to make up for it, inbetween scanning pictures and working on stuff for XQuery and for XSL-FO 2.0.

Now it's time to go and sedate the dog with some herbal calmer.

After playing with "ajax" a little for random images on my pictures from old books Web site, I spent some time investigating other XQuery engines, and in parcticular what used to be Sleepycat's dbxml, and is now Oracle Sleepycat DB XML.

I was using the Perl interface, and maybe that's a mistake, because it's obvious that they don't spend as much effort on it as on the C API. The documentation is very minimal, for example. But in the end, and after uninstalling all unwanted versions of bsd db from my laptop, it worked. Query time went doewn from 11 seconds to 2 seconds, partly because the 11-second version is starting a JVM for each query, partly because dbxml is in C, and partly because I had to remove some features from the query because I couldn't get them to work.

After help from one of the people maintaining the software, I discovered that I'll be able to get the other features to work. The search engine on my Web site isn't actually too slow for most queries (try it here) but it's using more memory than I'd like, and there are some queries on my photographs that do take too long.

The good thing about using XQuery to develop these things is that it's relatively easy to make changes. So maybe some changes are coming.

194 older entries...

 

Ankh certified others as follows:

  • Ankh certified graydon as Master
  • Ankh certified DV as Master
  • Ankh certified deusx as Journeyer
  • Ankh certified jwz as Master
  • Ankh certified jef as Master
  • Ankh certified trance9 as Journeyer
  • Ankh certified zodiac as Journeyer
  • Ankh certified argent as Master
  • Ankh certified esr as Master
  • Ankh certified wmperry as Master
  • Ankh certified macricht as Journeyer
  • Ankh certified piman as Journeyer
  • Ankh certified jtauber as Master
  • Ankh certified Malx as Apprentice
  • Ankh certified Iain as Journeyer
  • Ankh certified eikeon as Apprentice
  • Ankh certified Lordy as Apprentice
  • Ankh certified Tux as Apprentice
  • Ankh certified jdub as Master
  • Ankh certified ndw as Master
  • Ankh certified Milambar as Apprentice
  • Ankh certified Mysidia as Journeyer
  • Ankh certified jivera as Apprentice
  • Ankh certified vdv as Journeyer
  • Ankh certified ger as Journeyer
  • Ankh certified halcy0n as Apprentice
  • Ankh certified hugo as Journeyer
  • Ankh certified connolly as Master
  • Ankh certified skx as Journeyer
  • Ankh certified titus as Apprentice

Others have certified Ankh as follows:

  • scottj certified Ankh as Apprentice
  • graydon certified Ankh as Master
  • mathieu certified Ankh as Master
  • topher certified Ankh as Journeyer
  • andrei certified Ankh as Journeyer
  • deusx certified Ankh as Journeyer
  • whump certified Ankh as Master
  • chbm certified Ankh as Apprentice
  • Uruk certified Ankh as Journeyer
  • kelly certified Ankh as Journeyer
  • jmason certified Ankh as Master
  • Trakker certified Ankh as Master
  • lauris certified Ankh as Master
  • DV certified Ankh as Master
  • whatever certified Ankh as Master
  • jenglish certified Ankh as Master
  • mjs certified Ankh as Master
  • synap certified Ankh as Master
  • nixnut certified Ankh as Master
  • tetron certified Ankh as Journeyer
  • kimusan certified Ankh as Journeyer
  • link certified Ankh as Journeyer
  • ErikLevy certified Ankh as Journeyer
  • kanikus certified Ankh as Master
  • jtc certified Ankh as Master
  • dneighbors certified Ankh as Journeyer
  • beppu certified Ankh as Master
  • piman certified Ankh as Master
  • walken certified Ankh as Master
  • LaForge certified Ankh as Master
  • voltron certified Ankh as Master
  • suso certified Ankh as Journeyer
  • menesis certified Ankh as Master
  • RyanMuldoon certified Ankh as Master
  • MikeGTN certified Ankh as Master
  • jtauber certified Ankh as Master
  • Malx certified Ankh as Master
  • bratsche certified Ankh as Master
  • duncanm certified Ankh as Master
  • cuenca certified Ankh as Master
  • monk certified Ankh as Master
  • demoncrat certified Ankh as Master
  • rupert certified Ankh as Master
  • async certified Ankh as Master
  • eikeon certified Ankh as Master
  • Lordy certified Ankh as Master
  • olea certified Ankh as Master
  • bjf certified Ankh as Master
  • aaronsw certified Ankh as Journeyer
  • fxn certified Ankh as Master
  • murrayc certified Ankh as Master
  • fcrozat certified Ankh as Master
  • Milambar certified Ankh as Master
  • jivera certified Ankh as Master
  • jamesh certified Ankh as Master
  • simonstl certified Ankh as Master
  • vdv certified Ankh as Master
  • ger certified Ankh as Master
  • halcy0n certified Ankh as Master
  • trs80 certified Ankh as Master
  • pphaneuf certified Ankh as Master
  • derupe certified Ankh as Master
  • hugo certified Ankh as Master
  • connolly certified Ankh as Master
  • zbowling certified Ankh as Master
  • rillian certified Ankh as Master
  • cooly certified Ankh as Master
  • liam certified Ankh as Master

[ Certification disabled because you're not logged in. ]

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!

X
Share this page