Older blog entries for Ankh (starting at number 151)

Hmm, my Advogato article on the future of XML was picked up over at xml.com today.

One of my favourite books as a teenager was Titus Groan, by Mervyn Peake. I keep thinking about it when reading Advogato's recent diary page for some reason. Titus often turned on his heel, and a schoolfriend of mine sometimes used to reach down to his heel and say "click" before walking off in parody of this phrase.

deekayan, congratulations!

pphaneuf, think of all the time you'd save in the morning if you worked naked :-) Or in pyjamas, I suppose.

titus, I'd noticed that spam too, with some variants. Luckily spamassassin is using AI techniques, so we'll see. AI is a term usually only applied to things that neither work nor are perceived as useful: once a technique is useful it becomes "inference" or "compilers" (yes, that was AI once) or "statistical analysis". I think WaterBot would be more effective if it detected when you left the tap running and gave you a beating. Actually I think the author should be given a beating for using language like "positive auditory messages" and "feedback modalities".

On file formats, CSV is long dead ;-) :-)

zhaoway, one function per file in many programming languages means you lose the benefit of file scoping. However, I've written libraries with only one public API-visible function per file, or with very few per file, along with SGML (XML today) documentation in a comment at the start of each function, and it has worked very well.

I spent some time thinking about writing a Linux font management tool that can inspect all fonts in a given directory and preview them (like gfontview, which appears to be defunct unfortunately) and also enable/disable sets of fonts, probably by linking them into ~/.fonts and running the fc-cache program. There are some bits missing, but I think I can begin to see how to get something HIGgish. One rather hackish approach is to copy ~/fonts.conf and add an entry for ".", so that the installer can examine the current directory; I want to use the same rendering routines for installed and non-installed fonts, of course.

A danger is that badly formed fonts can cause weird problems - e.g. I found the "save as" dialogue not working in gtk programs a day or two ago, and it seemed to be because of a bad TrueType font.

Also thought about what it would take to add some XPath support to my text retrieval package, but came to the same conclusion I did last time I thought abuot this: it would take more time than I have available for hacking on it, unfortunately.

I wonder if the people doing SVG drawing tools have looked at Visual Thought? This was a vastly overpriced diagramming tool for Solaris, and also available for Windows. You can download it for free now, although it's still closed source. I used to like its connection smarts as I recall.

On a rather different topic, I finaly decided to make public a Web page that I made a year or two ago, One In Christ, about tolerance. It also has the Sponsored Bible: what would it be like it corporations could sponsor words and phrases in religious texts? It's actually slightly scary.

Today Clyde (my husband (yes, husband)) and I made an offer on a house, and it was accepted. Buying a house in North America seems to involve a lot more legalese than it did fifteen years ago in England, when most of what I needed was done on a handshake. The new house is in the country, and a major worry for me is that it might not have good Internet access for quite some time. On the other hand, there's enough land to start a nudist colony if we wanted.

Thought about trying to make my lq-text text retrieval package useable for Yelp, but I think it'd be more work than I have time for, sadly. The package is fast and stable, and was first released in 1989, but it doesn't have enough XML smarts.

I've also worked more on my pictures from old books site, and it's slowly coming alive again. I should figure out how to do a useful RSS feed, one that actually has image thumbnails in it; last time I tried, none of the RSS viewers seemed able to show images.

Several meetings coming up this year about the future of XML, mostly at conferences. The first such session is at the W3C Technical Plenary in the first week of March, but that's by invitation only for non-Members. I think the next public one will probably be in Japan at the WWW conference, and another perhaps at the XTech conference in Amsterdam.

id, the trouble with most visualisations like the one you suggest is that they don't scale well. You most need categorization tools when you have, say, 10,000 or more files. But at that point the tools mostly break. Even Nautilus isn't happy with a directory containing thousands of files.

pbor, perhaps I wasn't very clear (sorry) _ I'm very aware that the Gnome project has come a long way. I've been watching it closely over the past three or four years. None the less, right now, gedit is unquestionably lower than textedit on my computer.

I think three or four years developers were being targeted. i think (as I tried to say) it has changed.

None the less, I think overall that Gnome still isn't as customer-oriented (if you'll excuse the expression) as some commercial offerings. I don't think that's a necessarily bad thing, and I both use Gnome and recommend it to others.


Working on a trip to New Delhi, hoping to meet some potential W3C Members who can contribute a lot to our specifications and to the World Wide Web.

The article I posted here on Advogato about the future of XML (go look at the front page) has attracted a lot of interest on the xml-dev mailing list and elsewhere. Some people feel XML should never ever be changed, that the Specification is a Holy and Sacred Text. Some feel that the specification could be changed as long as no line of code had to be changed and no documents were affected. Some are concerned mostly with interoperability, and are OK with some code changes as long as no documents are affected. Some are willing to change documents but not code. And some are willing to change both, but only beneath the SAX or DOM level. There has even been talk of sacrificing children, something not currently mandated by any W3C specification. (nor, as far as I am aware, by any specification from IEEE, IETF, Oasys, ISO, ICE, not any other organisation or consortium in the computer industry. )

It's nice to see that people are listening, though, and expressing opinions. How else can we learn how people feel?


xach, maybe W3C should define Assembly Language for the Web, with no curly braces. Maybe there could be an arbitrary number of registers each defined by a URI. Text hadling in assembly tends to be hard, but hey, real programmers don't use library functions! :-)

pjrm, you might like to look at the XSL-FO specification as a way of formatting arbitrary XML content. If you do come up with new table layouts, the XSL Working Group may be interested to hear about them -- but note that there's been a lot of research on table layouts over the past five hundred or so years, so although there's certainly the possibility of innovation, it helps your case if you study what has gone before. I can give you some references, although for computer algorithms I'm afraid that most of the work has been propietary.

You mention minimising wasted space, so I should note that an experienced graphic designer will often increase spacing in tables to improve readability. One of the guiding principles is to have related items clearly aligned one with the other, of course.

I wouldn't hold out too much hope of getting changes into HTML at this point, but if you (or your organisation) wants to join W3C and participate in the HTML or CSS Working group, I can give you the necessary information, of course.

mirwin, I'm very pleased that you like the pictures from old books. Interestingly, I note that quite a few people visit the page when I mention it in Advogato, but look only at that single Web page, and hence don't get to see the pictures. I should redesign the page. But I digress. For those pictures that are marked on their corresponding HTML pages as being in the public domain, you can do anything you like with them as long as it's in accordance with the usage guidelines on my page. For the others, I have sent you email.

dfenwick, I recently tried running textedit (it comes with the xview-clients package). It starts masssively faster than gedit on my system. It has some features that seem weird at first (e.g. the ability to surround the selection with |> and <|, a feature used by Sun's calendar manager) but it has drag and drop, and from a user perspective the Sun OpenWindows desktop was probably slightly more integrated than Gnome.

So why the huge slowdown? Partly the number of shared libraries I expect, and partly the use of more layers. Is the code easier to maintain? That's not clear to me; having been inside both XView and Gtk+, the glue is more visible in XView, and it's not so clean, but then it's more than ten years older. Gedit has syntax highlighting I think (I've never seen it) but as a notepad replacement it actually seems less useful than textedit, which can easily reformat a paragraph or insert the output of a command.

The difference might be in the focus of the development team. The open source community tends to have a hard time in thinking of people trying to get every-day tasks done, rather than programmers using their code. The Gnome project has made huge and fabulous advances here, and the Gnome human interface guidlines have also helped dramatically, so we are definitely seeing improvements. Bt it's a pain that a 600MHz pentium system should feel even slightly sluggish.

Last week I posted an article about the possible future of XML at W3C that has had some comment also on the xml-dev mailing list.

cdfrey, you might like to investigate XSLT or even XML Query as alternatives that, after an initial culture shock, can sometimes be a lot cleaner.

I finally broke down and added Goodle Adsense Ads to my pictures scanned from old books and my eighteenth-century dictionry of thieving slang. The revenues are adding up to about US$2/day, which isn't a lot but will probably pay for Web hosting if we need to move the server.

In order to make it less likely we need to move server I have made a bandwidth theft image that's used if people use an img element to embed one of my images in their Web page. I've had several mail messages as a result, requesting permission. In most cases people are happy using a lower resolution image -- google's image search tends to find the largest resolution available, and many people assume they can use images they find on the Web however tehy please.

Actually it's quite common for people to use width and height attributes to scale down a one megabyte image to generate a thumbnail, not understanding the performance implications. Sigh.


gicmo, x means you cam make the directory your current working directory (also needed for descending into subdirectories). r means you can read the dierctory file itself, which lets you get a directory listing. Making a Unix directory mode -rwx--x--x (0711) means that other users can fetch files from the directory only if they know the filenames, and is sometimes used for incoming ftp servers or for use with a Web server such as Apache to prevent a directory listing.

I've been mulling over ways to garner public feedback - especially from open source developers - about what we (W3C) should be doing with one of the sets of specifications we publish - the XML family as I sometimes think of it. Is it time to start XML 2.0?

Where would you go (or post) to ask people why they're not using XML? There are lots of good reasons not to use XML, and lots of good reasons to use it, so I'm particularly interested in people who would like to go with XML but who feel they can't.

Performance is the most commonly-cited reason but it's not the only one. If we made some sort of more compact way to transmit XML that was more efficient to process, but still reasonably easy to deal with, would it make a difference?

What about the other specs in the family?

For my part, I'd like to see more work on integrating SVG and XSL-FO, and notes on working with both W3C XML Schema and Relax NG. I don't know how to make those things happen. I'd like to see some hot next-generation linking and multimedia love for the World Wide Web, but the most widely-used browser is quietly stagnating. I'd like to see more use of client-side XML, but even though there is XSLT in all the browsers that seem to me worth the time of day :-), the interaction between CSS, XSLT, DOM and HTML (and especially HTML forms) is too ill-defined. I want to run client-side XSLT on CSS files too, but that's another matter.

It's hard to say to people that we need better typography on screens. Even when typographic research indicates that typeface design, rendering quality, line and word spacing, choice of typeface, kerning, and many other factors, can all affect reading time, comprehension, recall and tiredness, none of that really translates into product decisions for most people. Oops, got into a rant there!

Anyway, maybe I'll post something on the front page of Advogato and see if anyone replies. Maybe an article on xml.com or somewhere too, if the editor will accept one from a man in bare feet.


Added Goodle Adwords to some of my Web pages. The income isn't huge, but it would be enough to pay for Web hosting, I think, if it keeps up at the rate of the first five days.

I also decided to do something about people who link to my images without giving me credit. A lot of people use an img element with height and width attributes to scale an image down, and set the src attribute to point to my Web site. If they do that now, and point to any but the smallest resolution image, they get this image, thanks to some mod_rewrite trickery.

I've had a few mail messages from people worried about their blogs, and served the image about 3,000 times so far, saving maybe as much as a gigabyte in transfer. I don't pay for the transfer right now, but I might have to shortly.


It's too cold in Toronto for long barefoot walks. There's snow everywhere. My husband and I are planning on buying a house shortly, and one of the candidates has over 80 acres of land. It's not very expensive, either, since it's quite some way outside Toronto, with the main concern I have being the difficulty of getting to the airport. Buying a house is very, very different in North America to England, where I last did it some eighteen years ago, mostly because of the multi-listing service (MLS).

Enuogh for now. Still recovering from trip to Australia. This travel thing can be tiring.

dwmw2 oops, I shouldn't post arguments when I'm jet-lagged, sorry! Altghough I have in fact erad your aritcle, I was confusing it with another, and for that I apologise.

I'm not going to respond further right now, simply because I'm clearly too tired, but I'll try to do so tomorrow, or whenever the fog clears, if it ever does.

On IP spoofing, I should try and clarify: the problem is one of traceability and accountability. People have suggest trust networks, but those fail in the face of forged mail.

There are North American ISPs that block port 25 too (other than to the designated mail server, of course), and although (as you rightly say) it doesn't block all forged mail, it does, when combined with disallowing forged IP addresses, leave a much clearer audit trail.

I think I was trying to agree with you about the forwarding, by the way.


I'm back from Brisbane, but (as expected) fairly tired. Kudos to DSTC for hosting us. (Us here is the XML Query, XSL and XML Schema Working Groups of the W3C, maybe a total of under 30 people working on Schema 1.1, XML Query, XSLT 2.0, and a number of shared specs between XSL and XQuery such as XPath and the Formal Semantics document). The meetings went pretty well, although it was a lot of work.

I've been wondering if I have the energy to go to Guadec this year. Probably not, unfortunately, although I'll be at XTech (used tobe called XML Europe) and very possibly at the WWW conference in Japan. I might also go to Doors of Perception, a conference on design that's interested me for overa decade.

A lot of talkback today...

nutella, yes, zhaoway's link was very interesting. I've also been reading Anthony Hall's The American Empire and the Fourth World [referrer link; see also www.goodminds.com for the Native-owned booksellers where I bought my copy], a book that I recommend, although I found I had to ask someone from the US to explain terms with which I was unfamiliar from time to time, such as Manifest Destiny. I don't think a conspiracy theory is needed - I agree with Noam Chomsky (e.g. see his essay reproduced in You are being lied to) - you only need to believe in the idea that wealth and power often (sometimes?) lead to greed and ruthlessness. I've often thought that people elected into power should lose all their possessions and monies, so that gaining personal wealth could never be a reason to stand for election. The main problem with that idea is that if you make politicians go naked then the politicians who would be elected would mostly be porn stars, I suppose. But California seems to be discovering that film stars aren't always entirely bad news, and maybe it'd help the US get over it's inexplicable fascination with and simultaneous disapproval of sex and nudity.

badvogato - I remember a breakfast conversation between Tim Berners-Lee and Alan Kay (and I think connolly and some others, maybe Yuri Rubinsky was there although I'm not sure now) as a result of Alan Kay saying in his keybote that HTML was the MS-DOS of the Web. There was no resolution, but I'll note in passing that the strength of marked-up text is that the reader, the user, the consumer, says how they want to use the text, not the author or producer.

dcoombs, if your socks are frictionless how do they stay on your feet? Maybe it would be safer if you were to work barefoot?

elanthis don't worry, you're still young :-) Arpeggios are closely related to broken chords; I'm not sure if there's really a difference at all in practice except that "broken chord" doesn't sound as pertentious! Morals as preached are, however, clearly different from morals as practiced. Sometimes this can be good (e.g. the Roman Catholic Church may have been responsible for tens of millions of deaths in Africa by telling lies about condoms; if the people had not gone with the morals the priests espoused, they'd perhaps be uninfected today; similarly Augustine's teaching that the christians needed to keep the Jews around as a reminder that "they" killed Christ, but not let them have power, may have been a major factor in the preaching that led to the massacres of Jews by so-called Christians in the Crusades (literally, Cross Wars) and later. Moral teaching must, it seems to me, be limited to general principles: absolutism in details is barely distinguishable from fanaticism. the Puritans whom you mention chose to forget that Jesus is described in the Gospels as turning water innto wine, and actually instructing his followers to drink wine. He also received an erotic massage from a prostitute, and it would be pretty weird to believe that someone who hung out with prostitutes would be a virgin. The "morality" of abstaining from sex and drugs (including alcohol) is mostly about trying to enforce a "work ethic" to keep poor people busy and productive. Harm no-one, love those around you, and try to accept people as they are, and everything else falls into place, I think. But it's easier to say than to accomplish.

deekayen, neat, the BBC survey said I'm a spatial thinker. If that's so, how come I get lost so easily? :-)

cdfrey, I think most people read http://www.advogato.org/recentlog.html once every day or two, or use the RSS feed. I suppose you could then search that page for your advogato nickname if you're too busy to read all the diaries :-)

dwmw2, although that old "why not SPF" article reappears every now and again, it's not at all clear that the arguments in it are sound. Certainly the author does not substantiate them very well. Neither a whitelist nor a blacklist seems to solve all use cases.

I personally continue to believe that the first step in addressing spam is actually for ISPs to be liable for forged src addresses in IP packets that they forward, and for forged email to be treated as fraud. If a cable modem sends a packet from a home computer up to the ISP, the ISP should check that the src field of the packet is correct before accepting the packet. This is a low cost thing to do and would make fake email headers massively easier to deal with.

This, combined with disallowing a direct connection to or from port 25 except by arrangement, which would remove the ability in most cases for a virus or worm to send or receive email, would make the use of armies of trojan'd Windows XP sysetms (or Linux systems or MacOS systems) to send spam pretty much a non-stater.

For now, SPF represents a good step forward, although it's true that there can be problems with people who use forwarding services that don't themselves publish SF records.

We do publish an SPF record at W3C although I don't yet do so on the server that I share with my brother (holoweb.net) because of a (cough) difference of opinion between Network Solutions and my brother. If solutions are liquids that dissolve things, network solutions are presumably things thet take away networking, and our expectations at times have been met :-)

I'm in Brisbane for XML Query Working Group meetings, kindly hosted by DSTC on the Queensland University Campus. I and some colleagues also spoke at a workshop on Friday.

Much more thinking about XML things than open source things of late, although as usual I've been following Mandrake Linux (cooker). We're slowly getting to the point where more and more of users' needs are met, but meeting 80% of everyone's needs is not as good as meeting all of the needs of 80% of people.

Someone wrote to me recently to ask for an office suite and database to help her run her small business. I don't think postgresql or mysql arethe sort of thing she had in mind :-) although OpenOffice will probably meet her needs in practice, perhaps along with gnumeric. We still don't have anything like Quark or InDesign, although scribus is starting to approach an old version of PageMaker, and passportout is also interesting for a more technical user.

The open source world can meet most of the day to day needs of a great number of people, which is a solid and remarkable achievement. It's not clear to me that computers have anywhere reached anything like their full potential, and meeting all of the uses people might ever have for computers is obviously not possible, so the right question is to ask where the growth in computing capability is most aligned with people's needs. That's somewhere that open source does seemto have a clar advantage.


steved, I've been interested in the reciprocality project for a while, even though the Web site isn't perhaps as grokkable as I'd wish ;-)

haruspex, well, Bush is now a Kingit seems.

142 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!