Older blog entries for Ankh (starting at number 127)

15 Oct 2004 »

I've been sleeping a lot this week. Part of it is that I was tired after a 40-hour trip door to door home from Japan, but another part is depression after losing one of our two pet cats, and part is being a bit overwhelemed with all the stuff we have to do in the next few weeks, including probably moving house and sub-letting this one. I'm tempted to cheer myself up by buying a computer, but I'm not sure that's very responsible! I did get to look at lots of digital cameras in Japan, although they're way too expensive there.

Spent some time getting a sourceforge page set up for some Web log summary scripts I wrote years ago; some of the documentation is up, but not the code yet, because in writing the install notes I decided it was all to horrible and I want to improve it first!

robocoder, I'm aware of two main dangers of relying on captchas (e.g. images of hard-to-OCR numbers used to try to keep spambots out and let people in). The first is that blind people can't use them, and in many cases this can be discriminatory and illegal, so you have to provide an alternate method that's not so difficult as to be discriminatory in itself. The second is that these systems can be easily broken if there is a financial incentive. There have been reports of spammers using a system that relays the captcha questions onto a free porn site registration form, for instance. When someone registers, the corresponding hotmail (or whatever) registration is completed by the software. One way round that is to use text questions that incorporate the name of your Web site in the answer, I suppose.

Ingvar, if it takes a C program 14 seconds to read 43MBytes of data on a reasonably recent computer, either the data format is very very intricate or it expands into using an awful lot of memory when unpacked.

If you're not doing it already, use profiling tools such as gprof(1) and maybe consider using mmap(). If there's no obvious function using more than 10% of the time, maybe consider inlining some frequently-called functions r turning them into macros (depending on which compiler you use). Compiler options can help too.

My hololog program reads a 50MByte or so httpd logfile in Perl in less time than that, including matching multiple regular expressions on each line. On a 250MHz Pentium 1 "Pro" system with 128 MBytes of RAM and slow 7200RPM disks. But it should be a lot faster if I work on it some more some time, I suspect.

Sometimes a good compromise is to write a C program to read the data and extract some of it into a text format (e.g. XML-based), and then weed it further in Python or Perl, or even XSLT or XML Query.

14 Oct 2004 »

My flight back from Tokyo was delayed by 17 hours, which I had to spend at the airport gate, where there's no restaurant. American Airlines didn't tell us the flight was delayed until the next day in time for us to book a hotel (they were all full), although they knew at least two hours earlier, since the pilot and crew had left, luggage and all. Other flights to the US took off after ours was canceled because of the typhoon, including at least one American Air flight with a similar aircraft. I value honesty very highly, and am not sure I want to continue flying with American Airlines, despite the extra room in economy and the laptop power outlet.

I'm thinking of offering a reward for anyone who can fix X11 (x11.org) with an ATI graphics card in a laptop so that it can switch to and from an external monitor without restarting X or rebooting, just like Windows 98 manages so well. Maybe I'll start by offering a pair of socks. Black ones. Argyle if you like. It's such a pain to speak at conferences and when you need to restart X in the break before your session the AV person sneers, "oh, a Linux user, they always have problems". It's a worse pain at workshops when you can't suddenly decide to project without losing all your windows. And XFree86 3.x used to manage to switch just fine on the same hardware.

Someone ordered a CD of some of my pictures scanned from old books so I'm finishing off the images they requested before sending off the CD. It's making me wonder if I could make the GIMP faster by hand-optimising the convolution filter code. But it's hard to believe it isn't already pretty well tweaked. A convolution filter on a 5,000 x 4,000 pixel image takes too long for my comfort!

Thinking about my Web site reminds me to say that after a couple of people requested copies of my Web server analysis scripts, I've registered a sourceforge place to distribute them and maybe let others hack at them too. I'll post details later. The scripts give both a more "holistic" overvew than most others, and also a more useful detailed view than others I've seen, and were designed to help build Web sites up from (say) a few hundred hits a day to a few thousand.

A couple of specific responses:

haruspex, you can try in Linux to add net.ipv4.tcp_keepalive_time=300 to /etc/sysctl.conf -- this fixes a symptom in which the router forgets NAT associations after 15 minutesor so of idle time. Another common problem is with MTU, and setting that to 1400 with ifconfig may help.

raph, you are often in our thoughts.

9 Oct 2004 »

Alan Cox on writing better software is interesting partly because people might take notice. I think he leaves out the single most important aspect of writing solid software, though: state of mind. You have to focus on robustness.

I remember once reading an article in an industry rag (Dr Jobbs' Journal?) that I don't usually bother with, on writing robust software. It said that the key to robst software is to use lots of assert statements, and gave the example of an interactive editor.

Most users wouldn't consider an editor that crashed as soon as it detected an internal inconsistancy to be very robust. The software might be robust from the programmer's point of view, in that many errors were foundduring testing and development, but assertions are a development tool (and a useful one), not a tool for helping software to be resiliant in the face of errors.

So it's all a question of how you look at it.

If you are wondering, I'd suggest using a good exception mechanism that logs an error and lets your program return to a known safe state with minimum data loss.

If the editor's buffer is known to be corrupt, warn before letting the user overwrite the original (or any other) file, for example. Don't require saving to a new file (disk may be full), but don't make it too easy. Old versions of the vi editor used to do this very effectively, and whether you liked the program or not, it was written to try not to lose data.

A function that accepts an integer in the range 7 to 31 needs to check its argument so that errors don't spread. But in the error case it needs to signal an error to the calling function, with some documented out-of-band value. It's tempting to say, the type system should handle this but that's not actually true. For example, you might define a C++ object that can only handle integers in the right range, but your function needs to be robust against the case where someone uses a cast or otherwise overrides the type, and also where they edit the header file and redefine the type (legitimately or otherwise) and don't change your code. Relying on the calling function, the type system or the compiler to do your checking is an abrogation of responsibility (more commonly called laziness :-)) and is an example of not being sufficiently diligent in writing solid code. It's about writing even a single line of code without asking yourself what are the possible error conditions and the consequences of errors.

So writing better code is about wanting to write better code: if you're not motivated, you won't do it.

A business decision that you are going to trust all data on the local area network can save you a lot of money in programming, but then when your system is deployed on the Internet the cost of making it secure is high. The real cost (as one large operating system vendor has discovered) is that you get programmers with a mindset of trusting data to be correct, trusting values to be within bounds.

Alan is right in that we need to find/create and use tools that can help us to identify and fix problems. If you manage a team of programmers, reward them for fixing bugs and for having fewer defect reports from outside the group, and be careful not to punish them for their errors, but to ask each time, how can we work togeher to stop this from happening again? Well, OK, and maybe confiscate their shoes :-)

devmeet - organised a devmeet for "deviant artists" in and around Toronto, this Sunday. But I am not sure that I'll be there: my 'plane back from Japan may be delayed by a typhoon.

If you're a Deviant Art person feel free to join us!

Japan - been here for meetings about the efficient interchange of XML. Good meetings, although I still find Tokyo a little intimidating. Maybe more on Tokyo later!

29 Sep 2004 »

Should there be more work on XLink, the W3C spec for linking? Or is hypermedia linking research dead forever?

Why can't I define implicit links in a Web browser (e.g. every <div class="link"> element adds look up in thesaurus to the context menu, and when invoked, supplies the element content to the URI in the global linking definition)

Why can't I have multi-way links that pop up a choice of targets, done without using a form with a drop-down list and a GO button and in a way Google can understand?

Why can't I load an external document into my Web browser that defines where all the links are in some other set of documents?

How should we get hot linking love going on? Or shouldn't we?

johnnyb, when we were designing XML we were very much coming from the world of technical documentation on the one hand and publishing on the other. Although XML was originally called SGML on the Web, publishing to print was also an important use case from the start, nd many of us had already done high-quality print publishing from XML's parent, SGML.

Uraeus - the trick when you are losing whack-a-mole is to change the rules of the game, or to play a different game. Analogies only go so far, so to be plainer, I think that we have maybe a couple of years and then we'll be in danger of being seen to be playing catchup to Longhorn. We need to be ahead - or, better, going in a different direction - from the outset.

It's why I think we should be pushing strengths of Linux (and BSD and other Unix vaiants) - and working hard on improving security, and on finding use cases preventing adoption of OSS and eliminating the obstacles. Miguel in particular has been doing a good job of eliminating obstacles.

AlanHorkan One approach is to turn up and mention that you think the defendent should be executed and is obviously guilty :-)

It's not clear to me that computer:// is actually a plausible or useful URI. Better might be gnome-administration://localhost/ with the implication that, given the authority, you could administer other computers. But, as you say, introducing a new URI scheme, espcially one that's not universal but is limited to the current computer, seems pretty bad. And why can't this use http or file: instead?

26 Sep 2004 »

XML I've been spending a little time thinking about the future of XML and plan post a little more on that subject here over the next few days. I don't want to do a front-page article just yet, as people get nervous when they hear the idea of changes to XML or anything to do with it.

It's not clear to me that we (W3C) should be making significant changes beyond stuff like supporting the latest Unicode and incorporating errata.

I'm not the only one to have published suggested reworks of XML; Tim Bray has his "Skunk-works" XML, and there are many others, but most focus on reducing coplexity or syntax and retaining expressive power. A few, such as LMNL, focus on reducing syntax and increasing expressive power, most notably by adding support for overlapping hierarchies.

With XML Query getting closer to a 1.0 version of the specification, with XML Schema getting much wider adoption, and with XSLT 2 also close to release, incorporating XML Schema as well as other much-needed functionality, it's time to think about the way forward.

We (W3C) have a Working Group whose task is to make a case for (or against) a standardised efficient (binary) transfer encoding for XML. Again, the outcome of that WG may inidicate some changes.

Some work has been less widely adopted, although that doesn't necessarily mean it's less important. We haven't seen very wide uptake of XPointer or XLink, and I admit that I'd personally like to see XSL/FO supported in Web browsers. Support for interchanging arbitrary chunks of XML was specified but never achieved critical mass.

People are using XML for a ton of stuff today that wasn't anticipated, and for some that was. It has become part of general computing infrastrcture - which is why people don't want to change it.

What's missing? Are there big pieces we (W3C? others/) should be working on?

This is getting a bit long for a diary entry. More tomorrow, if anyone cares :-)

Raph, I'm reminded that the Folio/F3 system by (I think) Jakob Valdez added conic sections to PostScript's primitives, which is why NeWS, which used F3 fonts, had an extra codeblock argument to "pathforall". The F3 font hinter had an excellent reputation, anf F3 fonts were typically smaller than TrueType for a roughly comparable (hard to measure) quality. Sorry if this is off-topic, I haven't been following Advogato after it went off-line for a while.

Pictures I've been having fun with DeviantArt for a while now (no, I'm not Ankh there, that's someone else) and the trouble with that is that it's making me want a better camera. The one that I have has the advantage of fitting easily into a pocket, and at 4 MPixels I can usually filter out the noise and end up with a fair-quality picture without having a huge SLR or medium-fomat camera slung around my neck. Maybe I'll end up with two camera. But a better scanner is on the cards first, because I've been scanning pictures from antiquarian books and putting them online; I have a number of old books with fabulous pictures, but that are too large formy scanner. So I think I'm going to get an A3/tabloid professional-quality flat-bed scanner. An alternative might be to get a rig that holds a medium-format camera in position a couple of metres away from a book on a stand where the page is open at a little over 90 degrees, as this doesnt' do so much damage to the spine of the book. Hmm, decsions, decisions :-)

Writing A while ago I started thinking about writing a book called What Every Unix Programmer Should Know. I don't have time to write a book right now, but I'm considering that maybe I could write a few chapters and edit contributions. I'll think about it. Every time I see a buffer overflow attack I think about spending more effort on stack protection, as well as wishing for good old-fashioned non-executable data pages as on the PDP-11 and VAX. So maybe it'd be beter spending energy trying to promote operating system robustness, an area where Unix once had huge advantages over Windows and MS-DOS.

OK, time for lunch.

7 Mar 2004 »

habes, I don't think I'd go so far as saying that a 3-column layout is a holy grail for CSS. If there is a holy grail I suspect (I'm not involved directly with the CSS Working Group) that it's supporting, simultaneouly, clear and effective communication and Web pages that work on any number of devices at the same time, something especially hard to do with tables. A secondary (but very important) goal must also be supporting the idea of domain-specific markup directly, thus helping people to re-use the information that's in their documents.

Of course, from the perspective of the Architecture Domain in the W3C, and particularly the XML Activity, CSS doesn't just apply to (X)HTML, but also to XML in general. There's an overlap with XSL/FO (formatting objects) and another with SVG, especially now that the SVG Working Group is adding wrapping text support.

Mandelieu (near Cannes in France) might sound like a wonderful site for meetings, unless you are familiar with all-day technical meetings and you know that you don't get to go out of the hotel very much. I go back to Toronto tomorrow morning.

We have just had the W3C Technical Plenary, where all the Working Groups meet in the same place, so that you get to have lunch, dinner, coffee with people from other Activitis and Working Groups, and get a chance to understand their needs. It's a good idea and seemed to go very well this time, although the four-day XML Query meeting has been a little gruelling!

Tim Bray mentioned me in his blog, on hearing that i was married; the interesting thing about this is that the visitors to my Web site from his mostly looked at pictures of me (is that the person I met in...), at my text retrieval package, and at my list of technical/XML/Web books. People from Advogato are more likely to look at my pictures from (and of) old books, and most especially at the scans of old books in many languages rather than (say) pictures of Castles. It's quite a strong trend, although I'm not sure if it means anything.

If you're wondering, I track this with a CGI script I wrote a long time ago that presents the Apache referrer log in full detail, but in a more readable way, with sessions groups together. I didn't make a distribution but a few other people use the script, so mail me if you want a copy.

27 Feb 2004 »

A death in the family kept me away from this Net thing for a while.

Before that, was at the ORA Emerging Technology conference. The biggest talk of the town this year is the Friend-of-a-Friend (FOAF) collaborative trust network stuff, part of the W3C Semantic Web, but also shown in things like Orkut.

My XML Query-based search engine for my pictures scanned from old books web site is almost ready; I made it four times faster by rewriting the query a bit, and by switching from Galax to Saxon, which does much less type checking. Type checking is a Good Thing, so I hope to switch back again with the next release of Galax.

Next is to try and get the SVG visualisation of query results working better, and it's almost ready to go live.

It's still much slower than I'd like, especially as the machine it's running on is fairly old now. But it's been fun to work with a geographical database (actualy a flat XML file), RDF and a relational database all in the same query. I'm giving a talk on this at XML Europe in Amsterdam in April. At least I'll have nice looking slildes, using the SVG and also the pictures from the old books! The neat thing as that all the data is XML, and is all manipulated in XML Query - there's no extra proprietary middleware glue: once you can get something into XML, you can process it. It's that simple, and a fairly compelling argument for using XML Query and XML.

7 Feb 2004 »

XML Query as Middleware is something people have been saying for a while now (including me in a talk at Emerging Technologies next week). The idea that everything going over the glue-pipes is XML turns out to be as startling in its way as the idea some 30 years earlier that everything could be newline-terminated ASCII streams of text. It's soemthing I've been looking forward to for over a decade, but it snuck up on me and startlified me by coming out of XML Query.

orkut: I've been surprised at some of the invitations. It's also clear the site isn't scaling all that well, alhough there doesn't seem to be anything that can't be fixed. The lists of thumbnails of friends, sorted by number of friends, seems to encourage competition. But I really wish they'd start making the friend network available in RDF, issuing RSS feeds of recently joining people per community, and so forth. It's not really giving enough back to the broader Web community I think.

Pictures: I had a request to scan the Slavonic entries from the almost three-hundred-year-old (1713) Oratio Dominica that I have online, and an offer to transcribe them. It's not clear how to do the transcription, and I may need to use a TEI Writing Set Definition and the Unicode private use aera in the absence of Old Church Slavonic charactersin Unicode 4. Until then, though, I'm busy with castle siege engines.

XML at the W3C is moving along in its own sweet pace; XML 1.1, Namespaces 1.1, Infoset 2nd Edition and XML 1.0 3rd Edition have all been published as W3C Recommendations. XML Parsers should be updated to accept both 1.1 and 1.0 documents (validating them appropriately). If you generate XML, it's OK to continue to generate XML 1.0, unless of course you need the new features, such as Unicode-compliant line end support (NEL) and extra name characters in element names, attributes, IDs etc.

4 Feb 2004 »

nymia, thanks for commenting on the pictures scanned from old books. I keep thinking of getting a domain for them like oldengravins.org or something and seeing if others will contribute.

Orkut.com says at me, You are connected to -31062 people through 72 friends. A negative number of people? Could it be that the programmer used a 16-bit integer somewhere and more than 65535 people signed up?? At any rate the number seems to be moving towards zero, strengthening the hypothesis.

I notice that of the images in my deviant art gallery , the ones marked as Wallpaper/Medieval seem to be one to two orders more popular than the others. I am guessing that a lot of people browse the site by category and choose Wallpapers, so I've move all my pictures of the right size that might be useful for screen backgrounds into that category, to see if it made a difference. It did.

What interests me about this is that it presumably indicates that the Deviant Art site is browsed by a lot of people who are not part of the community. This is very different from orkut, where you have to be a member to be signed up. Of course, the popularity of sites ilke Digital Blasphemy indicates a strong demand for screen backgrounds too, so it's hard to interpret the skimy data I have ;-)

Speaking of orkut, if you know me and want to be invited, get in touch and I'll happily invite you. Don't forget to tell me what colour socks you're wearing.

I've been running Knoppix for a few days, because my laptop died and I'm borrowing a system that doesn't boot except from floppy. It'd be nice to have Flash installed - even nicer if fewer Web sites used Flash, especially now there is SVG - and it'd be nice to have more memory, and it'd be nice if Knoppix's hardware detection was as good as Mandrake's seems to be - Knoppix didn't find the sound on this system for example - but it's a lot better than having to walk up the road to use an Internet Cafe.

Speaking of Flash, I went to see Lord of the Rings / Return of the King last night. I tried to book a ticket online, but the Paramount web site is totally useless without Flash. Luckily there was plenty of room, and I saw the film projected onto an IMAX screen. I love the Alan Lee drawings in the closing credits, and definitely felt it was the best of the three films. I forgive them their deviations from the book - not that they care about what I think, but I do :-)

Someone asked me to scan in some samples of Serbian script I have; when my laptop is back I'll do that. Possibly tomorrow.

When I get back from the O'Reilly Emerging Technology conference next week I want to post an article about why I use the particular distribution of linux I do (when my laptop is working!) - not to start a distribution war, but to try and elicit discussion about the maturity of Linux, and a bit about the cultural and social problems we have to solve. If you're willing to help by reading a draft, please let me know.

elanthis, my favourite rogue-like game was always Omega, for what it's worth, by (I think) Lawrence Brothers. Part of that was that the humour was clearly separated from the mechanism. Looking at AweMud's "screenshots", the game mechanics seem very in-your-face; I think I'd find it felt more like wargaming as a result than a fantasy/quest sort of game, but maybe I'm showing my bias (or my age?). I assume that calling a forest clearig a "room" is just poor scripting, and not part of the game engine. The last two face-to-face roleplaying campaigns I ran (in 1989/1990 and in 1992 or so) didn't have any combat, which was just as well as I didn't write a combat system for them. The players seemed to enjoy them, though.

3 Feb 2004 »

I had to take my laptop to be repaired today, and stopped off at BCE place (between Yonge and Bay, south of King, in Toronto) and ate in the Movenpick there. When I came out, there was a photography exhibition there, in the Canada Trust (or whatever bank itis) part of the building.

You might expect something very corporate. It was called "Toronto, a celebration" or something. There were maybe 100 photos printed maybe 2 feet across, mostly showing events like the Jazz Festival.

There were some pretty neat photos. One (visible from quite some distance aay becuase it was on the end of the display) was of someone wearing very tight white undies or shorts with "gaypride" printed on them. It could have been in a beer ad or something... except I realised after a moment that the wearer had a beard.

Another was of two men wearing suits and ties, kissing, it was called just married. There were pics of the Hindu Festival, the Pride Parade, Caribana, Jazz Festival, and quite a few others I've forgotten now. But it was neat to see the ones I mentioned, in a context where you might expect controversy to be avoided.

Of course, gay marriage isn't all that controversial here :-) The exhibition has a web page, although I can't see it right now (using knoppix while I have no laptop, I should try gnoppix!)

Seaking of controversial, I've been playing with orkut too. It feels somewhat validating to get a message, so-and-so wanted to add you as a friend, but then there's pressure to get as many friends as possible (I'm resisting). I'd like to distinguish friends, co-workers, acquaintances, lust-bodies and so forth, but then, what's the point? You can do almost nothing with it.

I can see interesting possibilities if it's integrated with google's main search. For example, "weight my google search so that pages made by friends, or linked to by them, are higher in the results".

It doesn't have the focus of a community site like deviantart.com (that's a link to my page there if you care; follow the "Gallery" link there to see the pictures I put there) nor the shared interaction of Advogato.

I'm off to San Diego next week to speak at O'Reilly's Emerging Technology conference. I'm really not sure what the audience will be like; I'm guessing it'll be more analysts, consultants, people trying to decide if new technology is important to them, than programmers trying to learn it, compared to (say) OSCON, but I'll see. I'll be talking about XML Query, which is actually pretty cool despite being fairly large and complex.

I've been working on a search engine for my Pictures of Ruined castles and other engravings and woodcuts from old books. This uses XML Query and RDF and SQL and XHTML and XML and CSS and SVG all at the same time, so it's very Acronym-friendly. I'll share the URL after the conference, I need to do a bit more work on it first, to make it more useable. It's pretty neat to discover how much you can do with XML Query in an all-XML world.

By the way, I've noticed that people visiting my picture site from Advogato tend to be most interested in the Oratio Dominica, a complete scan of an 18th Century book giving the Lord's Prayer in over 100 languages and in lots of different scripts. Next most popular is Fry's Pantographia, another old book of scripts. Maybe geeks like languages?

118 older entries...