Older blog entries for Ankh (starting at number 113)

It's Pride Weekend here in Toronto. A friend from Sorcerynet (Ben/riot) is visiting us, and he has been enjoying seeing thousands (literally) of openly Queer people wandering about, holding hands and kissing in the street.

Probably I should post some notes from Guadec but have been too busy, and others have no doubt said all that I would say.

Life here has been insanely busy since April, and is likely to stay that way for another month or so, unles I run out of health and strength!

I have been generatign some SVG with XML Query (using Galax and also the latest Saxon). This is a lot of fun.

I've been experimenting with SVG recently. I posted some notes on my initial experiences. Probably I should get a book on SVG and benefit from someone else going through this phase. The most irritating discovery so far is that Adobe's SVG Zone crashes the Adobe SVG plugin. (I won't give a link here, for obvious reasons)

You can see what I've been up to www.holoweb.net/~liam/download/yggdrasil6.svgz (if it doesn't work for you try the .svg file instead; if you get a 401 not found, it's because that's a temporary area and this is a work in progress; I'll post a proper link when there's something more stable.)

I also updated my short technical booklist of XML and other books, to add the XSLT cookbook and the RSS book from ORA, both pretty good.

Finally put up a binary RPM package for lq-text, my text retrieval package. It's compiled on Mandrake Cooker, feedback wanted - download it here if you want. I don't have a devel or src package yet, sorry. I will put up the tarball with the .spec filein it when I get a chance.

I wrote lq-text in 1989, after discovering that the cheapest commercial text indexing packages of any use cost upwards of US$30,000. Well, they had a lot more features than lq-text, but I posted the source on net.sources or comp.sources.unix or whatever it was called back then, and it was quite popular for a while. Now, since it doesn't have a GUI, it's probably less interesting, but I've used it in a few Web projects, so it's not dead.

Someone contacted me a year or two ago and said they'd been reviewing commercial text retrieval products, then found mine was the only indexer they found that could get through all their test data. Probably this is because I assumed most data structures would not fit in memory, back when 4MBytes was a lot of memory for a process to use. Well, outside of editors written at MIT :-)


Tomorow I fly to XML Query face to face meetings, and then to Budapest for www 2003, the Web Conference. The streets of Budapest will soon embrace my bare feet. Lucky Budapest.

Off to XML Europe in London tomorrow, Advogatoers, gnomers, XMLers, SorceryNetters, Gimpers, Friends, Relatives and beautiful youths in London are invited to say hello (as long as they are not wearing white socks, of course) -- although I can't promise to be able to have net access, unfortunately, o if I don't reply to email at once it's not because I am ignoring you.

I got formal approval to go to GUAD4C today. Watch out, streets of Dublin! Now I have to finish the paper. Which is fine, but I am also working on slides for WWW2003 where I'm both chariring a panel on the W3C track and speaking in the same panel.

I'm trying the new Opera 7.1 snapshot for Linux. The dynamic (shared) one for Mandrake 9 worked for me, although it didn't find the Java plugin. Probably just as well, it's less likely to crash. It's using Qt; Qt is definitely ahead of Gtk+ on the eye candy right now. I'm not sure if that's good or bad: the Gnome project has perhaps more of a focus on useability and accessibility than KDE has had, and this has resulted over the past year or two in a reduction in gratuitous featuers and weird graphic effects. On the other hand, galeon gives me white text on a white backgruond in text entry fields, so it's not so useful for Advogato diary entries right now.

More on Opera 7 after I've used it for a while.

MichaelCrawford, hmm, I wrote a long comment on your article Use Validators and Load Generators to Test Your Web Applications, only to realise I had not noticed the next link at the foot of the page! This was because it appeared beneath links to other articles, so I assumed it was a link to the next article, not to the next section.

A useful technique for larger sites, especially where there are many authors, is log based validation, where you take the most popular 100 pages (say) and validate them every day or two.

There is also a CSS validator that can help find errors in stylesheets; yuo mention this briefly without really explaining it.

It's not enough just to have valid markup: web pages should also be accessible to as many people as possible. Some countries have laws about this, too. People can start by reading the W3C Web Accessibility Initiative and also maybe try out tools such as Bobby to get a better feel for the issues involved here.

chromatic, like you I also have known the joy of a finished book. Or, the relief, I should say. Go and celebrate!

Mozilla is ten I remember being at the ACM SigIR conference in 1993 whrn the programme was interrupted to announce that Mosaic was released, and Marc (and I think Eric?) gave a demo. I went back to SoftQuad and managed to get Yuri Rubinsky and others excited about the new browser. The short-term result was work done on integrating an SGML parser into Mosaic for use on a CD-ROM. After that, though, in 1994, Yuri and I visited NCSA, and agred to produce an HTML editor. We made HoTMetaL available as a free download (with an upgrade of course, but the free version wasn't a demo, it really worked). It wasn't open source, but it was still pretty radical in our industry at the time.

The Web itself, of course, was already almost 4 years old when Mosaic was released: it was far from the first browser, and not even the first to handle images. But Mosaic brought a great many Internet users beyond Gopher, into postgopherism, a world where text, links, images, even sound and video, were all intertwingled.

It doesn't feel that long since I was first reading Usenet on a VAX (1982 I think, maybe 1983) and sudenly feeling connected to people all round the world.


Web Searches I get maybe 1,000 visitors a day to my web site, with between 5,000 and 15,000 hits per day. Most people arrive by way of a search engine, and usually I can see what they were looking for in the referrer; I then look at which pages they visited to get an idea of whether my site helped them or not, and to see what to improve. (I wrote a script that sorts the logs for me to make this easier)

Some of the searches that people do show that they have no real concept of how web search engines work. Maybe they shouldn't have to understand, but given the current state of the art, a clearer understanding would help them a lot.

Some of the searches are amusing, strange, or unexpected - here are some recent examples, in no particular order but numbered for reference.

  1. What is a well-trained horse expected to do when given the command "gee"?
  2. greek dictionary using letter j
  3. have some one seen a real atom
  4. show me a crowd of people in a street
  5. fur on men
  6. How to put warning in word files after 20 word
  7. up skirts
  8. honore de balzac woman thirty
  9. Babe From Hell
  10. do chubby girls with big feet grow tall
  11. generating XML from Oracle database
  12. parent directory naked
  13. england wank
  14. what's the Earth's Core made of?
  15. What is the finger print ridge pattern distribution for the entire population of the united states (matched a story by Daniel Defoe)

Presumably before long primary schools (kindergartens?) will start teaching web search skills.

hub, two things I'd like to see in font chooser dialogues, although they are hard to do...

  1. a get more fonts button to take you to myfonts.com or a fonts.gnome.org portal or whatever;
  2. drag handles on the font BBox in the text preview area, that, if dragged let you change the default linespacing for use with that font, and compress or expand the fonts horizontally.

get more fonts Let's work on establishing the idea that fonts are copyrighed enties (outside the USA) or at least that they are thngs of value. The destination portal needs to have Free fonts (there are not many) but also link to individual type designers' pages, and resellers. A useful adjunct would be if the font dialogue could display the font copyright information.

line spacing and font matrix Very few fonts were designed to be used with linespacing single: traditional type was cast onto blocks of metal. For this to work usefully the individual metal pieces of type were made as small as possible, so you could set type close if you needed to. But that meant that for normal text, you had to add thin strips of lead between the lines of type. Hence the term leading for extra spacing. 10% to 20% is usual (e.g 10 on 12pt text) but it varies on the typeface, largely depending on the ratio of the height of an x to the height of an X in Western (Latin) text.

It feels like early Summer here. I have no idea whether I'll be able to go to XML Europe next week: it will depend on SARS travel restrictions here in Toronto. I am supposed to be on a panel there, which I think is one I suggested, so I'd feel guilty if it didn't work out!

On the other hand, not long afterwards I have XML Query meetings in Gaithersburg MD (USA), and then www 2003 in Budapest, hungary. So I could do with less travel!

I had started to write an article about why people get involved in open source and Free projects, but I see there were two other articles on the same theme, so I'll put mine aside for now.

I have been playing with Galax, an open source implementation of the latest public XML Query draft. It doesn't do collections yet, so you can only query on a single XML document at a time, but it's useful for getting more familiar with the language. Be warned, if you try it, that its W3C XML Schema support is incomplete, and in particular doesn't handle mixed content right. The underlying query engine does handle mixed content, though, and the binary release seemed to work fine under Mandrake Linux 9.1 that I use.

I don't yet have a good feel for where XML Query will be used. I'm sure it will be used. There are already quite a few implementations, both open source and proprietary, and companies like IBM and Oracle and Microsoft are represented on the Working Group, so it will surprise me if there isn't support for DB2, Oracle, and SQL Server once the spec is final. Well, I have been surprised before. But if there is vendor support, I can see XML Query being used by database people to produce XML that's then processed by XSLT by webheads. And in that case, the fact that XML Query has a syntax closer to SQL and C than to XML may actually be an advantage, I suppose.

jaldhar, I apologise for oversimplifying. The article I mentioned didn't say that overall mental health (or any other sort of health) is better in India, though, but only that long-term recovery rates for specific conditions were better.

I strongly agree with you about community.

I noticed (by watching my log analysis program that I should really release some day if I can find the time) that people coming to my web site from Advogato, most likely because of sye's article about my pictures scanned from old books, are much more likely than other visitors to look at the Oratio Dominica (the Lord's Prayer in over 100 languages and scripts) or in my scans from Fry's Pantographia, another old book along the same lines. Most visitors are looking for pictures of ruined castles, I think.

The upshot of this is that I am encuoraged to scan more of Fry's Pantogrphia. If there are any particular pages that interest you, let me know (liam at holoweb dot net). There's an index to give you an idea of what is there; I have scanned I think 50 out of almost 800 pages.


Chicago, life doesn't end at 40. Or if it does, I'm doomed, because I shall be 41 in September.


It turns out that the country with the best long-term cure rates for many so-called mental disorders is India, with a massively higher sucess rate over 10 years (I seem to recall 65% to 85% compared to 18% in the US, but that's from memory and probably not accurate). The difference seems to be that instead of prescribing over-priced drugs to treat the symptoms (drugs that can also lead to violent deaths), they try to help the person live with their situation and be in control, thrugh meditation and insight. This is very third-hand; browse <the a href="http://prozacspotlight.org/">Mad Pride</a> issue that Adbusters for some references. I'm posting about it because I think several people here may find it interesting and useful. Plus, giving Adbusters more publicity can't be bad :-)


badvogato, some of the best sex I ever had was when I was a slave.

k, I love that track too. That and the No Nathanial! one, which tends to stick in my brain and go round and round and prevent me from thinking for days at a time. Some people would say I don't think much anywy, of course!

elanthis, sometimes there are secondary advantages to using XML that you don't predict. In some ways we're seeing the effects that LISP programmers hoped for more than 30 years ago: when you use a common format for information, and when you use declarative representations for relationships on that data, yuo can treat programs as data, and manipulate data with data, programs with programs. This is part of the success of XSLT. Yes, it's verbose, I grant you that. You might get some mileage by having a short-form syntax that is converted into the same representation automatically, so that you can still use XSLT (or something similar) to manipulate objects. I don't know. But don't write off XML because it's verbose. Typically, clear internal documentation of a format ends up being more valuable over a ten-year period than size of data. When you say, these aer data files that need to be edited by humans you are saying, the user interface to my program is humans editing configuration files. That's the problem you have, not XML. Or so it seems to me.

MichaelCrawford, at 40, I know exactly what you mean, I don't want to grow up either :-). I have had three text books published, one solo and two co-authored, and never saw any royalties, although I did get an advance for each of them.

Had Easter brunch with my brother zodiac and with graydon at one of my favourite restaurants in downtown Toronto, where the food is good but the service is always delightfully and frustratingly slow. So we got to talk a lot and catch up.

A sad article (one of many) at /. about security. Or rather, the responses were sad. I'd love to see more Linux distributions using stack protection, or automatic detection of buffer overruns with random stack cookies; I'd also like to see experiments with making shared library data memory read-only to the application by default. It doesn't matter whose fault it is, or who is the victim, I still want my computer to survive attacks.

Trying Uraeus' gnome theme (screenshot, also in (much bigger file) PNG format). Doing this shows me several buggy applications that honour text foreground colour but not background colour, so I get white text on a white background. glowers at Galeon. or it may be a theme bug.

If you like gay erotica and slightly goth fantasy, Storm Constantine's The Thorn Boy is highly recommended. She clearly enjoys writing about boys who are kept by lords or kings as pets or slaves or lovers.

What work shall have been done, what wrong

Shall the bird's song cover, the green tree cover, what wrong

Shall the fresh earth cover?

Spent some time playing with Galax (why on earth does galeon pop up a new window with that URI when I paste it in this text field? I have been liking Galeon less and less of late), an open source implementation of the XML Query draft. Jerome and Mary contribute a lot to the XML Query Working Group, and the software is very interesting.

I've decided I'll try and use XML Query to implement the next version of my pictures from old books site (pictures of castles, ruins, etc. scanned from books I, as someone not a copyright lawyer, believe to be out of copyright). I want to add searching by location, keywords, browsing by category and so forth, which I could actually do now since the metadata is mostly there, but I also want to do something with XML Query.

Speaking of holoweb (the server), upgraded (sidegraded?) it from a very old FreeBSD to Mandrake Linux. We never fuond time to figure uot how to get cvsup working, so it just got more and more out of date and harder to install stuff on as the ports stopped working. My brother prefers to administer Linux, and it's in his office, so that was an easy decision.

I mostly use Mandrake Linux these days; urpmi does most or all of what I needed from apt-get with (for me) less hassle, and I spend less time tuning the system or playing with configuration options, and more time getting work done or enjoying myself when I do want to fiddle. I don't want to start a distribution war: if some other system works well for you that's fine. But I have been enjoying using Mandrake Linux.

Two new O'Reilly books arrived this week: RSS Syndication and XSLT Cookbook. More on them later, for now I'll just say they both look pretty good.

And, finally, we're getting to the time of year when I don't need shoes outdoors any more here in Toronto! yay!

I've been in San José for XML Query working group meetings. I'm actually pretty pleased about the way XML Query is heading; late would be an understatement, but it will need to become a Recommendation (assuming it makes it that far) around the same time as XSLT 2, I think. Given that, and given the strong demand I hear at conferences and from both vendors and customers for they stronger type checking, I think it might end up pretty popular. I also think people will be generating XSLT transformations with XML Query, which is slightly mind-boggling but may help work around the limitations of higher-order functions.

On Wednesday had a good dinner with Raph, Jaye, Tony, Andy, David, Joyce, James, Aaron, and others, somewhere north of Palo Alto. It was well worth the cab fare. (Raph has a better list of who was there, including the "Open Source Air Force" people (OSAF).

The Holiday Inn Silicon Valley has an outdoor heated swimming pool and hot tub open 24 hours, so I've enjoyed swimming in the warm rain.

Last week at XML 2002 I had an exhibition of some of my calligraphy, which was fun. One booth boy came up to me while I was setting it up and asked, What are you selling? When I said, nothing, he asked, Which company are you with? and I realised that the concept of art for art's sake, not selling anything, was beyond him. I think he came from the Microsoft booth, but he could have been from a Bay area startup for all I know, except perhaps for his dark socks.</a>

The IBM Almaden research centre is pretty spectacular. The stone floors can be a little uneven, so if like me you go barefoot at every opportunity, you need to watch out. I wish I'd gone for barefoot hikes in the hills, though.

Some replies...

TheCorruptor, a good way to explore Mandrake Linux is to play with the Mandrake Control Center. Also investigate the Penguin Liberation Foundation for some useful packages that aren't in themain distribution, and for sample addmedia scripts. The rpmdrake user interface in the Control Center is a front end to urpmi, a command-line program similar in many ways to apt-get. I find myself using the What to do menu a lot (in Gnome at least).

For the sake of productivity, avoid frozen-bubble :-)

thomasvs, life is better when your feet are bare, put the new shoes aside :-)

104 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!