Older blog entries for Ankh (starting at number 133)

ncm, choice of programming language makes a big difference.

let $second := xdt:dayTimeDuration("P1S"), $time := xs:dateTime("2004-02-29T23:59:59") return $second + $time

works in XQuery (e.g. try it in Saxon).

Admittedly this doesn't have the lexical form you wanted for input or output, but it shows that some problems become a lot easier when one switches language.

It's also pretty easy in Perl :-)

OK, off to California for yet another business trip. Oracle has two buildings shaped like disk drives (deliberately so) in Redwood Shores. Possibly a trip to Fry's with a friend from Italy who will also be coming to the meeting.

We're working on a review of the latest drafts of XML Query and associated specifications; there were new public drafts just posted. XML Query seems at first to be a huge language, but in fact it's probably comparable to C, with the function library being compared to something like the C/posix libraries. Unlike C, though, it's declarative, and has XML items as fundamental data types. You can include XML fragments as literal constants too, of course.

AlanHorkan, I really liked the Inkscape calligraphy screenshot you posted. I'll have to try it when I get a chance to upgrade Inkscape, although I don't have a tablet and I suspect it's needed. If you like such things you can see some of my own calligraphy using more traditional media. :-)

pipeman yes, MikeLowe has been posting links to a commerce site here for some time, using a number of different usernames. I've marked him as uninteresting "1" and I think if enough people do that, maybe it will help.

Probably I (we) should take the time to complain to the ISP hosting the Web site. I get so much spam in my inbox (it's a pain that my email address at work is very public) that I usually only follow up two or three a week, mostly ebay or paypal scams.

haruspex *laugh*, no, that's not what I meant at all! It's a tiny detail, but what's interesting is to think about whether there comes a point at which tiny details don't matter, or if they all add up. It's rare that I've seen more than a megabyte or so of savings in a program by reordering structs, and usually there are larger gains to be had elsewhere. Those who start out not thinking about architectural details and code without worrying about memory will get bitten, though.

So a minor point, and all I can say is I was bored :-)

In order to declare a C struct sensibly you need to know a little about alignment and machine architecture, at least if you want to avoid objects with holes in them. Does it matter? Programs that use more memory than they need to run slower simply because of the paging overhead. The case where all programs fit into main memory seems increasingly rare on an out-of-the-box Linux system, and even minimalist ubergeek systems like NetBSD run on computers with finite amounts of memory, typically with more disk storage space than memory. So application memory competed with the disk block buffer cache. Every little matters at least a little.

At one time there used to be some commercial software that stripped away duplicate copies of C++ class dispatch tables generated by inefficient C++ compilers, saving (they claimed) about 25% of executable space and boosting startup time.

I mention this for no particular reason except that I think it's all too easy to fall into the trap of thinking that memory is cheap and mempry usage unimportant. Most profilers won't help you detect slowdowns caused by wasted space.

deekayan, yes, Mandrake is a Gnome release behind; it's not so much because of profitability but because the Gnome releases come out just after the Mandrake Linux release freezes. For my part it'd be worth delaying a Mandrake release by a month or so, but there's an obvious hit in revenue if they do that, not just for Mandrake but also for resellers.

anderson, what your diary entry didn't indicate to me was, why would i want to join DotNode? Saying that it's like Orkut isn't a strong incentive: once one has joined, orkut loses interest, or so it seems to me. Mostly becuase you can't link to pages within it. It's not bloggable.

Kerry, good luck! :-) :-)

On cross-platform development, I've always been interested in the idea of control objects (widgets, knobs, etc) that hook up to (conceptually) remote interfaces. I suppose a modern way to say it would be a collection of control/manipulation/display obects that use RPC or Web Services to connect to an interface, and that also are connected to each other by ThingLab-style constraints.

Then you get a very distributed application indeed, where the user is in control -- it's a way to open up a model-view-controller architecture.

In terms of toolkits, of course I always liked The NeWS Toolkit, or did after I'd started to get somewhere with it, and in away it shares some of this design: you could send a method to an object without needing to know where the object lived, at least in principle, whether in the client part of your application or in some display server somewhere. But it wasn't exactly cross-platform :-)

Haruspex, I wonder how many other active Advogatoers are in Toronto?

Someone asked me a question about refactoring code today so I thought I'd share part of my reply.

He was thinking of replacing lots of special-purpose functions (hasCar, hasTractor, hasTruck) with a pair, hasVehicle and getVehicle. This leads to:

if (hasVehicle(Marvin)) {
    theVehicle = getVehicle(Marvin);
    drive(theVehivle); // or whatever

A better idiom, though, might be

theVehicle = getVehicle(Marvin);
if (theVehicle) {

This avoids a number of problems. One is that it can be made thread safe, whereas the first idiom has a race condition. Another is that it's more efficient. Another is that the resulting code is slightly simpler, easier to read and understand, and hence less error-prone.

It does have the down side that tehVehicle has scope outside the test, so you can improve it (in C) with

    sometype theVehicle;

    /* above code goes here */

In C++, Perl and other languages there are other idioms for short-lived local variables, of course.

Although my friend's question wasn't actually about vehicles, I should also note that in choosing a new name for refactored code you need to choose a name whose meaning is obvious from the context in which it's used, so that people don't have to go and read the documentation for your function whenever it's called. Not that documentation is a bad thing, but only that program code that's easier to understand is a good thing.

There was recent discussion about XLink and linking in XML on the xml-dev mailing list; I'm pleased to see a few people talking about the more general problem of representing and discovering relationships, which can then be implemented as hypertext traversal.

Anyone running some flavour of Unix and Apache willing to help test a web log analysis script? I've been using it since 1999 or so, and am in the process of cleaning it up for others to use. It'll be on Sourceforge when I'm satisfied it's a little cleaner :-)

I'm in upstate new york (Oswego) this week to help Clyde (my husband) move to Canada (yes, husband, get over it!). But it's a working week so I'm not sure how much I'll actually be able to help.

Oswego is a town of maybe 10,000 people, surrounded by American chain stores like Walmart, American fast food chains like Subway and Dunkin' Doughnuts and McDonald's (at least two of each of those I think), two nuclear power plants and a campus of about 10,000 students. The campus and town have a slightly uneasy and distrustful relationship with each other. A couple of years ago a local pizza store refused to serve a couple of students onthe grounds that they had dark skin. Then the place was picketed by students. Then the owner burned it down and collected the insurance.

I'm glad Clyde is getting out of here. The university, however, offers one of the best courses in graphic design and fine art in the area. It was very weak on typography -- I think Toronto's York University's joint programme with Sheriden College is much better from that point of view. But SUNY Oswego was stronger on ceramics, which interested Clyde, and the York degree wasn't offered when he started. Now he has graduated and it's time to finish moving to Canada and settle down :-)

I've been sleeping a lot this week. Part of it is that I was tired after a 40-hour trip door to door home from Japan, but another part is depression after losing one of our two pet cats, and part is being a bit overwhelemed with all the stuff we have to do in the next few weeks, including probably moving house and sub-letting this one. I'm tempted to cheer myself up by buying a computer, but I'm not sure that's very responsible! I did get to look at lots of digital cameras in Japan, although they're way too expensive there.

Spent some time getting a sourceforge page set up for some Web log summary scripts I wrote years ago; some of the documentation is up, but not the code yet, because in writing the install notes I decided it was all to horrible and I want to improve it first!


robocoder, I'm aware of two main dangers of relying on captchas (e.g. images of hard-to-OCR numbers used to try to keep spambots out and let people in). The first is that blind people can't use them, and in many cases this can be discriminatory and illegal, so you have to provide an alternate method that's not so difficult as to be discriminatory in itself. The second is that these systems can be easily broken if there is a financial incentive. There have been reports of spammers using a system that relays the captcha questions onto a free porn site registration form, for instance. When someone registers, the corresponding hotmail (or whatever) registration is completed by the software. One way round that is to use text questions that incorporate the name of your Web site in the answer, I suppose.

Ingvar, if it takes a C program 14 seconds to read 43MBytes of data on a reasonably recent computer, either the data format is very very intricate or it expands into using an awful lot of memory when unpacked.

If you're not doing it already, use profiling tools such as gprof(1) and maybe consider using mmap(). If there's no obvious function using more than 10% of the time, maybe consider inlining some frequently-called functions r turning them into macros (depending on which compiler you use). Compiler options can help too.

My hololog program reads a 50MByte or so httpd logfile in Perl in less time than that, including matching multiple regular expressions on each line. On a 250MHz Pentium 1 "Pro" system with 128 MBytes of RAM and slow 7200RPM disks. But it should be a lot faster if I work on it some more some time, I suspect.

Sometimes a good compromise is to write a C program to read the data and extract some of it into a text format (e.g. XML-based), and then weed it further in Python or Perl, or even XSLT or XML Query.

My flight back from Tokyo was delayed by 17 hours, which I had to spend at the airport gate, where there's no restaurant. American Airlines didn't tell us the flight was delayed until the next day in time for us to book a hotel (they were all full), although they knew at least two hours earlier, since the pilot and crew had left, luggage and all. Other flights to the US took off after ours was canceled because of the typhoon, including at least one American Air flight with a similar aircraft. I value honesty very highly, and am not sure I want to continue flying with American Airlines, despite the extra room in economy and the laptop power outlet.

I'm thinking of offering a reward for anyone who can fix X11 (x11.org) with an ATI graphics card in a laptop so that it can switch to and from an external monitor without restarting X or rebooting, just like Windows 98 manages so well. Maybe I'll start by offering a pair of socks. Black ones. Argyle if you like. It's such a pain to speak at conferences and when you need to restart X in the break before your session the AV person sneers, "oh, a Linux user, they always have problems". It's a worse pain at workshops when you can't suddenly decide to project without losing all your windows. And XFree86 3.x used to manage to switch just fine on the same hardware.

Someone ordered a CD of some of my pictures scanned from old books so I'm finishing off the images they requested before sending off the CD. It's making me wonder if I could make the GIMP faster by hand-optimising the convolution filter code. But it's hard to believe it isn't already pretty well tweaked. A convolution filter on a 5,000 x 4,000 pixel image takes too long for my comfort!

Thinking about my Web site reminds me to say that after a couple of people requested copies of my Web server analysis scripts, I've registered a sourceforge place to distribute them and maybe let others hack at them too. I'll post details later. The scripts give both a more "holistic" overvew than most others, and also a more useful detailed view than others I've seen, and were designed to help build Web sites up from (say) a few hundred hits a day to a few thousand.


A couple of specific responses:

haruspex, you can try in Linux to add net.ipv4.tcp_keepalive_time=300 to /etc/sysctl.conf -- this fixes a symptom in which the router forgets NAT associations after 15 minutesor so of idle time. Another common problem is with MTU, and setting that to 1400 with ifconfig may help.

raph, you are often in our thoughts.


Alan Cox on writing better software is interesting partly because people might take notice. I think he leaves out the single most important aspect of writing solid software, though: state of mind. You have to focus on robustness.

I remember once reading an article in an industry rag (Dr Jobbs' Journal?) that I don't usually bother with, on writing robust software. It said that the key to robst software is to use lots of assert statements, and gave the example of an interactive editor.

Most users wouldn't consider an editor that crashed as soon as it detected an internal inconsistancy to be very robust. The software might be robust from the programmer's point of view, in that many errors were foundduring testing and development, but assertions are a development tool (and a useful one), not a tool for helping software to be resiliant in the face of errors.

So it's all a question of how you look at it.

If you are wondering, I'd suggest using a good exception mechanism that logs an error and lets your program return to a known safe state with minimum data loss.

If the editor's buffer is known to be corrupt, warn before letting the user overwrite the original (or any other) file, for example. Don't require saving to a new file (disk may be full), but don't make it too easy. Old versions of the vi editor used to do this very effectively, and whether you liked the program or not, it was written to try not to lose data.

A function that accepts an integer in the range 7 to 31 needs to check its argument so that errors don't spread. But in the error case it needs to signal an error to the calling function, with some documented out-of-band value. It's tempting to say, the type system should handle this but that's not actually true. For example, you might define a C++ object that can only handle integers in the right range, but your function needs to be robust against the case where someone uses a cast or otherwise overrides the type, and also where they edit the header file and redefine the type (legitimately or otherwise) and don't change your code. Relying on the calling function, the type system or the compiler to do your checking is an abrogation of responsibility (more commonly called laziness :-)) and is an example of not being sufficiently diligent in writing solid code. It's about writing even a single line of code without asking yourself what are the possible error conditions and the consequences of errors.

So writing better code is about wanting to write better code: if you're not motivated, you won't do it.

A business decision that you are going to trust all data on the local area network can save you a lot of money in programming, but then when your system is deployed on the Internet the cost of making it secure is high. The real cost (as one large operating system vendor has discovered) is that you get programmers with a mindset of trusting data to be correct, trusting values to be within bounds.

Alan is right in that we need to find/create and use tools that can help us to identify and fix problems. If you manage a team of programmers, reward them for fixing bugs and for having fewer defect reports from outside the group, and be careful not to punish them for their errors, but to ask each time, how can we work togeher to stop this from happening again? Well, OK, and maybe confiscate their shoes :-)

devmeet - organised a devmeet for "deviant artists" in and around Toronto, this Sunday. But I am not sure that I'll be there: my 'plane back from Japan may be delayed by a typhoon.

If you're a Deviant Art person feel free to join us!

Japan - been here for meetings about the efficient interchange of XML. Good meetings, although I still find Tokyo a little intimidating. Maybe more on Tokyo later!

Should there be more work on XLink, the W3C spec for linking? Or is hypermedia linking research dead forever?

Why can't I define implicit links in a Web browser (e.g. every <div class="link"> element adds look up in thesaurus to the context menu, and when invoked, supplies the element content to the URI in the global linking definition)

Why can't I have multi-way links that pop up a choice of targets, done without using a form with a drop-down list and a GO button and in a way Google can understand?

Why can't I load an external document into my Web browser that defines where all the links are in some other set of documents?

How should we get hot linking love going on? Or shouldn't we?


johnnyb, when we were designing XML we were very much coming from the world of technical documentation on the one hand and publishing on the other. Although XML was originally called SGML on the Web, publishing to print was also an important use case from the start, nd many of us had already done high-quality print publishing from XML's parent, SGML.

Uraeus - the trick when you are losing whack-a-mole is to change the rules of the game, or to play a different game. Analogies only go so far, so to be plainer, I think that we have maybe a couple of years and then we'll be in danger of being seen to be playing catchup to Longhorn. We need to be ahead - or, better, going in a different direction - from the outset.

It's why I think we should be pushing strengths of Linux (and BSD and other Unix vaiants) - and working hard on improving security, and on finding use cases preventing adoption of OSS and eliminating the obstacles. Miguel in particular has been doing a good job of eliminating obstacles.

AlanHorkan One approach is to turn up and mention that you think the defendent should be executed and is obviously guilty :-)

It's not clear to me that computer:// is actually a plausible or useful URI. Better might be gnome-administration://localhost/ with the implication that, given the authority, you could administer other computers. But, as you say, introducing a new URI scheme, espcially one that's not universal but is limited to the current computer, seems pretty bad. And why can't this use http or file: instead?

124 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!