Older blog entries for Ankh (starting at number 180)

Papers papers papers. Next week a trip to Paris for W3C XML Query face to face meetings, colocated with the XML Schema and XSL Working Groups. These meetings are often highly productive and helpful.

One of the papers I have to write is for IEEE Spectrum magazine, introducing XML to engineers. This interests me because engineers are often more focused on immediate problems than on interoperability (and I say this as someone with at least some engineering background). A solution that appears to be less than optimal for their needs may be rejected, even if in fact using it would give a huge benefit that would outweigh the perceived (or real) inefficiencies.

People, in a way, don't benefit from standardised shoe sizes either: you always end up with shoes that don't quite fit right. But you quickly learn which size is closest, and it's massively cheaper than having shoes specially made, so you see fewer barefoot people wandering about. Actually I wish you saw more barefoot people wandering about, not least since shoes are not necessarily healthy, but that's beside the point.

halcy0n, I'm sorry that you are feeling disillusioned. All projects involving multiple individuals have politics, although I've seen quite a few open source projects in which hostile flame wars are the exception rather than the rule. Both Gnome and Mandriva Linux (cooker) seem to me to fall into the latter category. One thing both projects have in common is outreach, that is, that they aim to provide software not only for themselves but for other sorts of people entirely. Perhaps Eric Raymond might call this scratching someone else's itch, I don't know. Another thing they have in common is having a wide age range of people involved, and again, I think that sometimes helps.

sktrdie, you say, The great thing about advogato is its simpleness and elegancy and then you say, I've been thinking in starting an open-source service such as advogato, in the meanwhile add more features to it. Beware that in adding features you will reduce the simplicity, and your project may lose the thing you most desire, by the very virtue of your work on it.

Working a little on the content mismanagement system I use for my pictures as well as for my images scanned from old books. Managed to wreck the RSS feed briefly, but it's back now. Probably just as well.

The cache for the image search engine now gets deleted more aggressively when it's invalidated; it gets up to two or three hundred megabytes per day, which is fine as long as the cached query results are useful. I don't bother with LRU; the pattern is that they are all invalidated if I upload a new image, and since that generally happens at least once a day, all I needed was for the out-of-date files to be deleted automatically. They were in any case being ignored when out of date, so I had got that part right.

Next is to manage a queue of pending images to upload, and to make a suitable front end so that other people can contribute images more easily.

I should mention that I'm interested in other people's collections of high-quality scans; let me know if you find any cool ones :-) and maybe we can merge or I can link to them. High quality ideally means at least 1200dpi scans, though, in most cases, so as to be without murky grey bits everywhere.

All this by way of procrastination: I'm supposed to be working on a paper on microformats for Extreme Markup, which I think of as the XML conference I find most interesting and thought-provoking; I'm also supposed to be working on an article for IEEE Signal Processing Magazine on XML.

Image of the day: The Discovery of Tin in Britain (a cartoon from the 1890s). Caution, it's a bad joke.

The piano finally arrove, so I had to move office to make room for it (it's a baby grand that had been in storage, after being repaired). So I set myself up with a second monitor. The ATI radeon card in this Dell notebook seems unable to drive the external screen at 1600x1200@75dpi, so I'm with a dual head setup. Unfortunately it's fragile: the ATI installer overwrites one of the xorg X11 libraries, which means that the packages tend to fight one another. It's a good opportunity for /etc/alternatives which is used by Mandriva for other purposes.

I'm still going through photographs from 2004, trying to get up to date. Last night I added pictures of Pendennis Castle in Falmouth, Cornwall. Right now I'm just getting the pictures the right way round and giving them very simple captions; later I'll probably do a gallery of the ones I like the most. The Search link currently takes you to the Search pictures from old books page; I'll fix that when I've finished uploading the 2004 photos.

I also made some preliminary notes on Linux font management; this is still very very sketchy, and I'd appreciate contributions by email (liam at holoweb dot net). I'm most interested in font management under Gnome but I'll take pointers for KDE as well. I am not going to add command-line programs that require you to issue SQL statements, because my goal is to get Linux useable by design professionals, whose focus tends not to be in grokking such things.

Every once in a while I'm reminded why I use Mandriva Linux. I watched someone try to plug in a digital camera and upload pictures. Her husband insists on debian (OK, GNU/Debian Linux[tm]) but this meant instead of "plug in the camera. click on the icon that appears on the desktop" it's "find the device in /dev and mount it". A small difference perhaps, but a big one in outlook. Of course one could configure GNU/Debian Linux[tm] to behave the same way but her system administor husband looks down on things that are too easy. So he has a computer system that's designed to appeal to a system administrator who looks down on people who are not system administrators. Maybe Ubuntu would be a good compromise for the pair of them, based on debian but produced by people who care about using computers to do other things.

zanee, you are right: choose your battles.

badvogato, why should I tell you my husband's name when I don't know your name? Pictures of your ankles on a postcard please :-)

OK, I relent, he's called Clyde.


I got behind with digital photos, so I've been uploading basically unsorted pictures; I'm up to 2004, and in particular to the holiday in the UK that my husband [yes, live with it] and I had in September of that year. Some of the pictures are pretty good, but of course most aren't, so I'll have to try and make some selections eventually.

Last week I spent some time explaining to someone the difference between XML's name-based typing and the structure-based typing that was in an early draft of XML Query. I suppose you could say the structure-based typing was like an early version of the C Programming Language, in which tw types were entirely compatible if they had the same storage classes. You could assign bar = foo, in other words, if the number of bits in each variable was the same (more or less). By 1978 C had evolved past this, and there were implementations in which if you did
    typedef int hatsize;
    typedef int shoesize;

you couldn't call a function expecting a shoesize and pass a hatsize without at least getting a warning. In Java or C++ it would be absurd for an assignment across classes to be anything but an error, regardless of storage sizes. And it's absurd in XML too, in most cases.

Slowly working on my XML blog; I should add stuff about strong typing.

My husband installed a codec for a Web site he trusted, which turned out to've been misplaced trust, as it installed some virulent malware that keeps popping up saying you've been infected with adware or spyware, and need to buy their anti-adware tool. Of course, to make this credible, it also installs some adware in the background.

Part-way through a new Windows installation using the Acer recovery disks, we discovered that one of the disks was missing. And this left the laptop unusable. Well, usable by Linux :-) Luckily, Acer agreed (for a surprisingly small fee) to ship replacement CDs by overnight courier, so we should have them in a couple of days (you have to add a day for the border, usually).

Stupid marketing flyer of the week comes from The Source by Circuit City, which used to be Raido Shack. On the laptop with the smallest screen and least memory they say Increased memory and larger screen is ideal for gaming and graphics; on a mouse, fel the precision with an optical mouse (all their mice shown are optical)... there's a memory stick shown with the caption SONY 512MB Memory Stick PRO is smaller than a stick of gum -- possibly, but the stick shown clearly says 256MB on it. Two adjacent cameras have captions, (1) 6MP digitcal camera has everything you need to capture your best shots and (2) Everything you need in 6MP digital camera. I'm not sure how those captions are supposed to help differentiate the products. Looks like maybe The Source isn't long for this world.

On a more positive note, some of my calligraphy was used on the front cover of an American current affairs magazine called Time, which is cool.

rmathew, I'm not sure I'd take Ian Hixie's rant quite as strongly as you seem to've done. With only a little care you can serve XHTML documents as text/html and use XML tools with them just fine; I suspect Ian Hixie doesn't use XML tools very much. Opera (where he worked until recently) was very much dragged kicking and screaming into a world in which XML support was a given, and they have only recently added client-side XSLT to their browser.

On the subject of renaming folders, it's worth putting a redirect into your .htaccess or apache.conf so that you don't break the Web. Well, so that you don't break your bit of it :-)

Been going through piles of old digital photos, pictures from 2004, slowly catching up. Most of them I took for use as stock for people into photomanipulation, and a lot of them have been used. But I had only posted some of them. Also expanded my Calligraphy booklist somewhat.

I spent much of today patching holes in the upstairs of the barn that we use as an art gallery, so the birds can't nest in it and then spill their poop onto the artwork below. I hope that we get to a point where my husband and I can concentrate on making art, writing software, the garden, and life, instead of concentrating on working on the house. But it's going to take a while!

Distressingly, some scanned engravings from a book on torture has turned out to be fairly popular. Maybe I should not be surprised. I'm glad the images are not in colour.

In the unexpected light relief department, I was going from Toronto airport into the US one day last year, and the US immigration official asked me my job. I said (for the sake of simplicity) that I work for a standards organisation. He promptly looked at me and said, “what have you got in that suitcase?”
“Clothes and toiletries” said I, whereupon he asked,
“You're sure you've not got any metric in there? We don't want any of that!”
I assured him that the metric system was from a different organisation and that we (W3C) don't do that, and he grudgingly let me pass. Was that a twinkle in his eye?

Back from travel to France (W3C Tech Plenary) and California (Unicode conference). The Unicode conference reminds me (it doesn't take much to remind me) of some of the work that is still needed to tame fonts on Linux.

Afterwards, spent some time putting up more scans from old books on the Web. I did some reasonably high resolution scans of some 16th century type (2400dpi I think, I forget) but the files tend to be too large for comfortable Web viewing or downloading. I'm willing to digitise more type samples if they are of use to anyone, and also of course to host them, together with metadata and a search interface. I'm more likely to get requests for pretty initials (drop caps) or for castle plans, though, most of the time.

I wanted to play with the Google map interface to try and provide another interface to locations depicted in the images, but I haven't yet found the time.

titus mentioned John Udell's blog entry quoting R0ml as saying that open source means you don't need standards, because you take away the concept of ownership of a core technology. This is a bogus argument in oh-so-many ways. First, being open doesn't always mean that the core technology is not owned by some group or individual. A fork isn't always feasible. But that minor quibble aside, standards do not address the issue of who owns technology. They are about having multiple implementations that work together.

Standards are the reasons I can list over 100 open source IRC clients that work together, or that there can be so many different clients for the World Wide Web on so many different sorts of hardware and for so many environments.

We need both open implementations and open specifications, and we need specifications to be freely available (as in beer) so that people can afford to implement to the spec directly.

zanee asked about how to go about improving an open source project where the design may be questionable but the maintainers and developers don't admit a problem.

The act of wondering how to act is a necessary first step, and you've taken it :-) (I sound like a horrorscope from a cheap paper).

It's often a case of having to be very tactful, and also having to get the developers to want to make the changes, and being confrontational is unlikely to do that. Supplying a patch may help, as might convincing your company that they need faster time-to-market/turnaround, or whatever their jargon happens to be, and that it's unreasonable for the software to take 14 hours to run. The first approach gets you working with the developers, and the second gets pressure applied on them to improve the product.

I should note, by the way, that there is nothing about writing object-oriented code in Perl that prevents you fron use strict, and also that Devel::DProf should certainly work, although there are problems with thread-enabled Perl on some platforms that might conceivably interfere, and also native (xs) method calls might be a problem. You may find use strict; no strict vars; of use; see perldoc page for strict.

Your rant about how "people off the street" should be able to compile some complex Linux package is not I think well spoken. If software is too difficult to configure by the people who intend to use it, it's the software's problem. Every time.

vab, welcome to MIT; it's a fun place to work.

For people reading this blog syndicated, the original article is currently on advogato.org/recentlog.html which shows the most recent few entries.

Wow, lots of snow fell.

pesco, why invent a new syntax? The advantage of XML is not that it's particularly elegant, but rather that it's widely used. There are some nice feaures -- the end tags add a level of redundancy/error checking for example that is particularly useful for markup that can't easily be checked by computer as "right" or "wrong". And there are some less nice features. But most of the interesting research is at the edges (as Tommie Usdin said in her keynote at the Extreme Markup conference this year).

To be sure, there are experiments with alternate syntaxes, such as LMNL, an experimental markup language supporting overlap and structured attributes. Overlap is probably the biggest driving force in markup research at the syntax level today, although most people still try to stay within the bounds of XML in order to take advantage of all that XML software and understanding.

You mention functions. I claim that well-designed XML markup vocabularies are declarative in that they indicate a result or a meaning, if you will, rather than giving an algorithm. Of course, XSLT and XML Query straddle the boundary here. But they operate on XML documents, and those documents can be used and interpreted in many ways. Consider taking an SVG document and producing a set of colour swatches representing the colours used in the image described.

I see a lot of proposals for alternatives to XML, and I'm always interested to see use cases and hear what is being solved that XML can't do. Sometimes people think XML can't do things that it can; sometimes they think that a "more elegant" solution will appeal to a wider audience (but very rarely can people from such widely differing communitues as XML users agree on "elegant"), forgetting that it is not elegance that drives the adoption of XML; sometimes they genuinely have new ideas, or things they really can't do with XML.

We recently created a Working Group at W3C to investigate efficient interchange of XML; even Microsoft, who vociferously and publicly opposed the creation of the Working Group, have since said they'll consider using the result if it meets their needs, or if their customers demand it (and they do). Of perhaps more interest to you, we also created an XML Processing Model Working Group to standardise upon a way to say how an XML document is to be processed. It's a sort of functionjal scripting language for XML, or that's what I hope it will be. The processing model work is being done in public view, so you can watch, and maybe also get involved.

[disclaimer, in case it is not obvious: I am the XML Activity Lead at W3C]

Too many blogs. What to do? I wanted to put together some notes on buying and owning a home but rather than start a new blog, I am just making static Web pages for now. Once I'm up to a dozen or so pages I'll maybe rethink things.

Also working on two papers, one for the Unicode conference next Spring (on XQuery) and one for XTech 2006 (the conference is subtitled "Building Web 2.0").

fxn, yes, I agree strongly wth Tim's comments there. Before the FSF started, the Unix community used to share "public domain" software. However, I should also give Richard and the FSF credit for a unifying vision of a complete freely-available operating system -- I'll say freely available because like Tim I think the politics Free part caused some problems.

I actually do support Free software, but I also prefer to try and form consensus and agreement with all parties, and the FSF at that time wasn't known for flexible compromise. I don't think there are easy answers, though.

171 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!