Older blog entries for apenwarr (starting at number 17)

Curses Sucks, and there's No Excuse!

Okay, so I finally did it. As part of a project I was working on at work for the last few days, I decided to ignore curses entirely (for various reasons, most of them bad; leave me alone), and in the process, the library I wrote solved two problems:

- regardless of your TERM setting, it displays correctly (and I mean perfectly, modulo the lack of colour in win9x telnet) in the Linux console, xterm, rxvt, putty, minicom, Win2k telnet.exe, and - yes, really! - in Win9x telnet.exe.

- regardless of your TERM setting, in all of the above programs, my HOME, END, PGUP, PGDN, and INSERT keys work as they should (except for programs which happy refuse to send those codes at all - notably the Win2k/Win9x telnet programs).

How did I do it? I did what curses and ncurses never did. I followed the first and most important law of successful communication, attributed to Jon Postel: Be liberal in what you accept, and conservative in what you send.

My program sends only the most basic VT100 codes: gotoxy, change colours, change-to-wacko-line-drawing-font. We could be fancier, but then my output wouldn't work everywhere. Be conservative!

On the other hand, it accepts *any* of several possible codes for HOME, END, etc. Every stupid bloody terminal does it differently, and I don't care; I'll take them all. Nobody who says ESC[7~ doesn't mean HOME, even if not everybody who means HOME says ESC[7~. Be liberal!

Of course, I don't support non-basically-vt100-compatible terminals. Now first of all, I don't care, because (hello, join the 1980's!) there aren't any. Secondly, nothing stops curses from doing my basic-vt100 thing by default, and different things if you *do* set your TERM specifically. You're the weirdo, you go suffer. Unfortunately, curses stupidly tries to do something optimal by default. Well, this is to cut down on wasted output, you'll say. Remember 2400 baud users, you'll say. I want my email reader app to be legible in this crappy terminal emulator without spending three hours trying to guess which of the 5 million 'xterm-*' terminfo settings is the right one! ARGH!, I'll say. This isn't so hard. Be conservative by default, and if I'm a weirdo with a 2400 baud modem, I can set TERM to something more efficient. Easy. The "output conservativeness" problem is only a fault of the people who write terminfo databases, so technically we won't blame curses for that.

Unfortunately, for input, curses made a fatal mistake: the terminfo format *itself* has a one-to-one mapping between escape sequences and input codes. There cannot be more than one HOME. That means, basically, there is no way to "be liberal in what you accept". This is a fundamental design flaw in the terminfo file format, and AFAIK you can't fix it in a backwards-compatible way. But you can still fix ncurses. I'd be more than happy if someone would just finally do so.

There are two flaws, however, that I haven't solved: the ridiculous "ESC is a key and also an automatic sequence", bug, that means pressing ESC to cancel a dialog is essentially never going to work right. And there's the ridiculous "nobody knows what code backspace is" bug, that originally (ie. in a VT10x/VT220) was never a problem, but eventually someone (I think it was the X Consortium) mangled completely by sending ASCII 127 for the keypad DEL key. There's no saving people with that keyboard mapping (backspace->8, DEL->127), and unfortunately there are a lot of those people. But I can save everyone else, because CTRL-H is backspace (shut up, emacs users), 127 is backspace, and several things like ESC[3~ are DEL.

There. I'm glad I got that off my chest. (I think I followed pphaneuf's rules for flaming because I went and implemented something better before I flamed the crappy library we poor losers have been suffering with for decades.)

Digital Cameras

I've finally half-joined the ranks of the "elite" who seem to all not only have cameras, but also now have digital cameras that cost twice as much. I say "half-joined" because I actually saved myself about 33% of the overall cost by skipping the "normal" camera phase entirely, buying only the overpriced digital kind.

dcoombs is doing a fine job of posting lots of culture-inducing photos in his NitLog, so I won't do that. I just bring it up because I finally figured out, after being pressured into buying one, what's so great about digital cameras: instant gratification. Perversely, it's the same reason people still buy books and CDs at stores instead of online: because you can get your book or CD now instead of waiting for it to arrive. Similarly, with a digital camera, you can have your photo now instead of waiting for it to be developed.

Moreover, you can learn a lot faster. In normal photography, you would have to try a lot of experiments at once, go get them developed, learn where you screwed up, try again, develop them again, and so on, usually with (at least!) a day's lapse in between. It's like Rapid Prototyping for wannabe photographers!

Cole Slaw and One-size-fits-all

The little-known-outside-Quebec-yet-popular chicken roasting chain St-Hubert (now with Business Class!) is the only restaurant I've ever been to that offers both creamy-based and vinegary-based cole slaw. The problem with cole slaw is that there are these two kinds, and for each kind, something approaching 50% of people like that kind and detest the other. (Actually, there's a third kind, KFC radioactive-green-based, but nobody at all seems to like that kind.)

Anyway, most restaurants serve only one kind of cole slaw or the other - people are so sure that their preferred kind of cole slaw is the best that they only serve that one kind, and they don't even label on the menu which kind that is. And yet, half the restaurants in the world still serve the other kind, so you'd think they'd notice. As it is, you have to ask the waiter before ordering coleslaw, "Is it the creamy kind?" and order or not order based on that... or, like most people, simply don't order it at all.

At St-Hubert, unlimited cole slaw is included in *every* meal for free - and they offer both kinds, and people like it and eat it. The only way to have people eat cole slaw was to offer both options.

In this sentence, I was going to tie all that into software development and one-size-fits-all user interfaces and explain how sometimes taking away an option that's "the same for everyone anyway" might be a bad thing, but I think you can probably see where I was going, so I guess there's no need to insult your intelligence. If you have any.

The 80/20 Rule

pphaneuf pointed me at a comparison of successful/unsuccessful technologies based on various attributes. The supposed winning method, although the survey is grossly statistically invalid, was that projects developed using the 80/20 rule are generally more successful (Tim Bray).

This reminded me of some other discussion of the 80/20 rule, notably Bloatware and the 80/20 Myth (Joel Spolsky) which is (from the title) obviously of the opposite opinion. And there's a related opinion regarding Java and its flood of APIs.

So that got me thinking: how valid is the 80/20 rule? When NITI started, it was just a small number of people, even fewer developers, and a brilliant (IMHO) idea - and we built and sold a few boxes, but we never suddenly started swimming in money. Ever since then, we've been adding developers and doing more work, but less and less of the work has been brilliant ideas. More and more of it has been normal, day-to-day stuff. And with each release, incorporating more normal, day-to-day stuff, we're able to sell more and more software; exponentially more. The 80/20 rule got us started, but it's the 99% rule (soon to become the 99.9% rule, if we get too popular) that gets you big.

This actually explains a few things. OpenSource is described, in Tim Bray's first article above, as tending to "just copy what works", regardless of whether it's very 80/20 or not. But this isn't the whole story. I think OpenSource is "bloatware", as described in the second article, from Joel Spolsky. It followed 80/20 (Tim Bray's version) to get started, but now anybody can extend it to get whatever they want - and people do. Linux now gets used everywhere you can imagine, precisely because someone was able to easily change it to do what they wanted, not because it really did a lot of stuff or was particularly good at something else.

The alternative approach is something like Windows, which isn't so easily extended (it's not that hard) but is already near-complete - it can do more stuff out-of-the-box than Linux can, and so it's popular for different reasons. Unfortunately, the cost of making something as feature-complete as Windows is exponentially more expensive than the cost of making something as extensible as Linux; I suppose that means Windows will be the one to fail, eventually. (It also means that the GPL was the magic ingredient that turned Unix from a failure into a success; and that the GPL has and would have nothing to do with the present/future success of Windows, since Windows' success doesn't come from its extensibility).

Do I have a point? Well, how about this: both Tim Bray and Joel are right, but they read different things into the 80/20 rule. Tim Bray says 80/20 is about starting simple and selling it before you're done - also known as Worse is Better. Joel is opposed to limiting your software to the necessary 80% in order to keep it simple and elegant, and then expecting to be wildly successful. It just doesn't happen. You start off mildly successful using 80/20, then you throw 80/20 aside and get serious... or you at least open source your thingy and use a really good plugin API so that people can fill in the other 19.9% of usefulness for you by doing the remaining 99% of the work.

4 Jan 2004 (updated 4 Jan 2004 at 19:13 UTC) »
Distributed Filesystems

I just discovered unfs3, a userspace implementation of NFSv3. Contrast with FunFS, a totally new filesystem, also in userspace, that I helped (a little bit) to design.

unfs3 is pretty good, but it will probably never be as fast as a kernelspace nfs server; FunFS, on the other hand, is already faster than any NFS, simply because the protocol is optimized for minimum latency and *real* caching, not cheeseball NFS-style caching.

On the other hand, the unfs3 server is already fully functional with a 54k binary; FunFS is still not fully functional, and it's much bigger (if you count its WvStreams dependency). Statelessness (ie. NFS) obviously gives several advantages, simplicity being a major one, but I wonder if these advantages will be worth the compromises (ie. stupid in-kernel servers vs. bad performance) we have to make?

28 Dec 2003 (updated 28 Dec 2003 at 02:43 UTC) »

Wow, has it really been that long since I posted something here? I guess I'm either lazy or overworked. (In fact, those two things are tightly related. Or the same?)

Branch Constraints Revisited

If you're looking for the actual article, you can find Branch Constraint Theory on Advogato.

It's been about 8 months since I posted the original paper and started implementing my own advice. So, how did it go? I've learned a few more things since then, but the advice turns out to be correct - math is a wonderful thing - and understanding it has led to some definite improvements in our inter-release time and bug counts. One missing detail: I didn't originally consider inverted CVS branches, where you create unstable branches and only merge into the release branch when the code stabilizes. This complicates the math considerably, but can allow shorter release cycles, especially for "big" features. There's more for me to say eventually...

Magazine Reviews

NITI products get lots of positive reviews in the press, but it seems that every computing product does. I used to think that this is because, as "everyone knows", you simply pay off a magazine and they give you a good review. Therefore I was suspicious when they - universally - not only didn't ask us for money before reviewing our products, but also gave us extremely positive reviews despite our lack of kickbacks or advertising. Someone finally explained it to me a few months ago: computer magazines depend on getting free samples of products in order to review them. If they got a reputation for putting out bad reviews, nobody would send them stuff to review. Only positive reviewers survive. So all reviews - even comparative ones - only say positive things. It's just that some are less positive, and some are more positive. If your product is really seriously sucky, you'll maybe get a neutral review (facts only), but you have to be really, amazingly bad.

Then I talked to a musician on the plane from Montreal to Thunder Bay, where I'm visiting for the Christmas holidays. He reminded me that in music, it's not the same at all: there are lots of music reviews that totally trash the music in question. And then he told me why that is: while in computers, a negative review could destroy you, in music, the worst possible thing that could ever happen to you is a "lukewarm" review. Art is about stirring up people's emotions, and, love it or hate it, as long as you care, they've been successful. So the hardest music review to find should be the lukewarm one.

Architecture, Implementation, and Glue

Speaking of plane-musician conversations, we also had some discussion about electronic music and the tendency to endlessly remix things, to the point where it's now an entire art form to simply mix other people's music together well. In the extreme other direction, I saw a concert by Kalmunity - Live Organic Improv in Montreal a few weeks ago. They generate their music in real-time, using only non-electronic instruments, and they're very good at it. Somewhere in between is a "traditional" musician that composes his own music and then assembles a band to play it with.

So programming is an art form, right? I see some parallels here: the above three types of musician correspond to gluers, virtuoso coders, and architects, respectively. I've been thinking of this a lot, because I'm trying to find a better way of organizing software development teams. Most effort in this area concentrates on finding a great architect (composer) and then combining him/her with a great team of virtuoso coders. If you can do that, you're doing really well. But open source, like electronic mixing, makes possible a new kind of developer that didn't exist before: the gluer. This person is an architect, in a way, but the architecture is imposed after the individual components have been built. The open source world doesn't have a lot of great architects or gluers. It seems to me that if you want to maximize your success with open source, the best way to do it is to become an excellent gluer yourself - you can become famous for assembling other people's work and making a final product. KDE is more famous than KMail.

Correlation, Continued

I've been playing more with document correlation in the last couple of days, this time trying to subdivide documents into groups. This is based on the vague feeling that our tech support call logs probably have some kind of trends, and if we could just identify the "most common" kinds of calls automatically, we could focus more effort on improving the product to get rid of that kind of support call. Auto-categories are kind of screwy, though, because they might just as easily group documents by support technician's name instead of by problem type - and that's obviously not what I want in this case, but I have no way to express that. (And don't even talk to me about metadata! I would have an entire rant about metadata, if anyone ever bothered to tell me what it was!)

What have I done?

According to http://net-itech.com/america/news/story_00025.htm, I said this about Guadec:

    "Net Integration produces and contributes to several Open Source projects in the process of making our Net Integrator and other products, and we're happy to be able to give back to the community by assisting at GU4DEC," says Avery Pennarun, Vice President Development and Senior Systems Architect, Net Integration Technologies. "UniConf, our advanced object-relational web-enabled remote management console with cross-platform transaction-oriented XML content delivery, will totally change the way people do business and revolutionize the world of Open Source."

So here's the bad news: I did, in fact, say that.

And the less bad news: I was kidding when I said it, I swear!

I honestly expected the people in marketing to see my joke, especially that last part about "totally change the way people do business" and "revolutionize the world of Open Source." Unfortunately, it's now a press release linked off our front page. I think I have just learned some kind of important life lesson.

I'll let you know when I figure out what it is.

Okay, so I lied somewhat in my UniConf paper for GUADEC. The paper admits that I lied, but doesn't say about what exactly. I'm going to spend some time this weekend and over the next week trying to make as many of those lies as possible come true, thus making myself retroactively more honest.

The hardest part to get right will be the sample monikers. I probably should have tried those before I included them in the paper.

Visited a Buddhist temple on Thursday. I saw an hour-long ceremony (more than 50% of which was silent meditation) and then they gave me some cookies, literature, bottles of blessed water, and a sacred tomato.

(Actually, I was later assured that the tomato was not sacred at all, but just any old tomato. But it sounds better if I call it a sacred tomato.)

4 May 2003 (updated 4 May 2003 at 05:35 UTC) »
Real-life Buffer Underruns

Okay, this is about a week old but I'm so impressed by it that I'll post it anyway. While I was visiting my family last week, I finally learned how water pressure tanks work.

Background info: my family lives outside of town, and gets their water from a well. Wells are in the ground, and typically the water would rather be in the ground than come up to your house and shoot out of your taps. In a city, they make the water come out by pressurizing the water at the Central Water Pressurizing Authority (or whatever), but out in the country, you're left to your own devices.

What you *can* do, and some people do this but it's bad, is to just turn on your pump whenever you turn on the tap. The pump works like an airplane propeller, sucking the water from the ground (or a holding tank, or whatever) and pushing it through your pipes into your tap. The problem: this means your pump has to turn on and off an awful lot, and response to water requests is rather sluggish.

The alternative is a pressure tank, in which you pump water into the tank, increasing the pressure so that the water wants to get out. The more water you add, the more it wants to get out. Now here's the catch: why does adding water make it want to get out more? What is "water pressure" exactly? Since liquids can't be compressed, how can adding more liquid increase the pressure?

My dad finally told me the answer to this last week. (It turns out he's always known this, but I never asked. Go figure.) The answer is: the pressure tank is actually a sealed tank full of air, and the water goes into an expandable baggie *inside* that. An empty tank has some amount of air pressure (with one valve for adding/removing air) and an empty baggie attached to the in/out water pipes.

Adding water to the tank decreases the volume of air, but not the amount of air. Thus, because of everyone's favourite chemistry formula, PV=nRT, the air pressure increases. The air wants to expand and push the water out of the tank, and the more water you add, the more the air pressure increases - and thus, the more the water pressure increases. The "water pressure" that you can measure on the water outflow pipe is exactly equal to the air pressure in the tank.

Why do we care? Well, this means that in fact most water pressure tanks are more than 50% empty even when they're "full." They're mostly air. And if you tune them wrong, it's even worse. So don't expect that your 65 gallon water pressure tank will ever have 65 gallons of water in it, or you'll seriously confuse yourself.

And now we get into my area of interest, namely networking. What's the point of the tank again? To make it so the pump doesn't have to instantly respond to changes in water requests. Also to make it so that, even if the well runs dry temporarily, there'll still be water available to service our requests. But standard tank tuning algorithms (which my dad also knew, because he read the instructions for his tank) involve setting up the tank to activate the pump when the tank is almost empty and deactivate the pump when the tank is almost full (where "full" is using our new definition, meaning still more than half empty). In programming, we call these two points the "low water mark" (LWM) and the "high water mark" (HWM), respectively (snicker).

Problem is, if the well runs dry while you're nearing the low water mark, you have almost nothing left in your buffer, er, tank, to service the requests. So, for maximum recoverability, you want the low water mark to be as high as possible. But this decreases the "block size" of demands on the service provider, er, pump. Where with the tank almost empty you could say "send me 30 gallons of water", now you have to say "send me 2 gallons of water" much more often. In the typical heavily optimized case, your tank doesn't buy you anything because the LWM=HWM, just like with no tank at all. You'll only notice the improvement if the service disappears altogether for a while.

Of course, you're also receiving water that's 30 gallons old - your tank has increased the staleness of the data, er, water, that you receive.

Okay, the other problem, and this is the point of the story: the reason my dad bought the tank in the first place. The well was running dry pretty often this winter, since it was a pretty dry year. To improve the situation, my dad thought it would make sense to buy a larger tank (buffer), so that if there was a temporary high demand for water followed by a long period of low demand, the tank could fill up and not stress the well too much. Now, this didn't really work, mostly because of the HWM/LWM tuning problem above, and partly because no buffer will save you if the long-term average bandwidth of the network is less than average demand. (Water caching is gross.)

In fact, the larger tank made things worse once spring came and the water was abundant again: because the transaction size (HWM minus LWM) was much larger than with the old tank, it would run the pump for longer in one shot. Unfortunately, HWM-LWM for the tank is more water than would fit in the entire well, so you're guaranteed to underrun your buffer every time you go to the network for more service, thus causing TCP to start limiting your bandwidth and... oh wait, I'm mixing my metaphors again. Anyway, the tank made it appear that there still wasn't enough bandwidth, even when the congestion went away.

The Moral of the Story

Adding buffers doesn't fix anything unless you tune them properly. And they sometimes make your system unusable.

More Mozilla

dan: I really did try to find something that the average person would find more useful in Mozilla than IE, but I stand by my opinion. The average person doesn't benefit. If my grandma could figure out what a popup was or how to configure blocking for it, that would be great; if IE didn't do perfectly good password management already (and even better with RoboForms) that would be wonderful.

I was honestly hoping (expecting?) that I'd find something in the list of 101 items, since I'm fully aware that Mozilla (or its variants) are the best non-IE browsers available. I use it myself, since I don't have a Windows desktop. But unfortunately Microsoft has us beat. If one of the things in the list had been "loads faster" or "renders faster" or "renders more pages that people actually visit" or "integrates better with your desktop" or "fits in with your existing desktop theme" or "isn't ugly", then you would have had me. Sadly, IE does all of those things better than Mozilla, and those are the things that the average person cares about.

I was even personally pleased to see that Mozilla is now claiming pipelined HTTP, which is one of my big concerns - sadly, average people don't care about that, either.

8 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!