Older blog entries for dwmw2 (starting at number 218)

Things I hate today include:

  • Symbian on my Nokia N97 — for spontaneously rebooting as soon as I got off the ferry.
  • Google Maps — for not caching the map tiles I'd carefully downloaded while I was on the free ferry wireless, showing my route to the hotel.
  • Mobile phone networks — for the insane amount of money it will have cost me to re-download the same map tiles again, as I was driving.
It's almost as if it's a conspiracy — especially between the latter two.

I really need to get myself an N900 and start using maemo-mapper again. Every time I try to use non-free software, it hurts.

I've just been working on Evolution's reply code, and have added a couple more of those annoying "nag pop-ups", including this one which I expect a lot of people will appreciate when they don't get the resulting mail:

Evolution nag pop-up for replying to too many recipients

It's currently set to trigger if you hit 'Reply to All' on a message with more than 15 recipients; unless it's a mailing list message. And of course you can see that it's trivial to turn it off if you never want to see it again.

I've also taken a moment to write down and post some thoughts on the 'Reply to All' vs. 'Reply to List' debate for mailing list messages.

Yay Brazil!. They're making it illegal to use DRM to prevent "fair dealing" with copyrighted works, or access to works which are in the public domain. It's also legal to "crack" DRM if you're only doing it for the purpose of "fair dealing".

So, for example, it would be legal for me to crack the DRM on the eBooks I buy, which is necessary just so that I can read them. Currently I have to break the law just to be able to buy and use eBooks.

UK citizens, go here and add your vote; it's very simple to register if you haven't already done so.

apenwarr:

What was wrong with the SOCKS server that SSH provides? Playing transparent proxy tricks is cute, but why not make it work using SOCKS and then it would be more generically useful?

Better still, you can use an otherwise unused corner of IPv6 address space for your dynamic proxying so you aren't messing with the client's Legacy IP routing at all.

My God, I've been vaguely aware of the HTML5 video train wreck but I hadn't realised just how much of a fucking abortion the rest of the HTML5 'standard' is.

I had the misfortune to read the section on character encodings over the weekend, and it almost made me lose my lunch.

Not only does it codify the crappy and unreliable practice of applying heuristics to guess character encodings, it also requires that a user agent deliberately ignore the explicitly specified character set in some cases — for example, text explicitly labelled as US-ASCII or ISO8859-1 MUST be rendered as if it were Windows-1252!

It justifies this idiocy, which it admits is a 'willful violation', on the basis that it aids compatibility with legacy content. By which of course it means "broken content", since this was never actually necessary for anyone who published content correctly even with older versions of HTML.

But that doesn't make any sense — surely legacy content won't be identifying itself as HTML5? It might be reasonable to do these stupid things for legacy content, but not HTML5. The complete mess we have with charset labelling is a prime example of where the RFC1122 §1.2.2 approach of being lenient in what you accept has turned out to be massively counter-productive — if we'd simply refused to make stupid guesses about character sets in the first place, then people would have actually started getting the labelling right.

The sensible approach to take with HTML5 would just have been to say "All content which identifies itself as HTML5 MUST be in the UTF-8 character encoding. A conforming user agent MUST NOT attempt to interpret content as if it has any other encoding; any invalid UTF-8 byte sequences MUST be shown using the Unicode replacement character U+FFFD (�) or equivalent."

Or, if we really must continue to permit the legacy crap 8-bit character sets, it should have said that the content MUST be in the character set specified in the HTTP Content-Type: header or equivalent <META> tag.

Keep the stupid heuristics for legacy content by all means, but it should be forbidden to render HTML5 content in a character set other than the one it is labelled with, and all invalid characters (including the C1 control characters in ISO8859-1 which in Windows-1252 would map to extra printable characters like the Euro sign) MUS be shown as U+FFFD (�). And then the people who publish broken crap would see that they're publishing broken crap, rather than thinking it's OK because the browser they use just happens to assume the same character set as the system they're publishing from.

To me, HTML5 looks less like a standard and more like a set of broken hackish kludges to work around the fact that people out there aren't actually capable of following a standard.

Eww, this country is uncivilised. Just got back to my hotel room and my clothing reeks of smoke. I'd almost forgotten how horrid that was.

mjg59 writes:
"If the offence was unintended, an apology should be cheap."

An apology is cheap, it's true — but it's also counterproductive, because it reinforces the false belief that such an apology was necessary or appropriate.

Pandering to these people just contributes to the utterly idiotic culture of political correctness which blights our society.

Let's take a look at what he actually said, for crying out loud...


“A release is an amazing thing; I’m not talking about the happy ending..”: 3:02
It's crude, but I don't see it as being sexist. The terms 'release' and 'happy ending' could just as well be used to describe the female experience as the male experience, although ladies are less inclined to make such reference to it in public. It's not excluding women; it's excluding prudes.
“Your printer, and your mom’s printer, and your grandma’s printer”: 35:30
Oh, for crying out loud. Would it really have made that much difference if he'd said 'dad and grandma', or 'mom and grandpa'? No, it wouldn't. Some people must have been trying really hard to find something to take offence at.

Of course your mum is likely to be less technical than your dad. That's just the way the world is. Does your mum complain when she gets cheaper car insurance? Men and women are different, and we shouldn't be burned at the stake if that fundamental fact of life affects the minor details of how we phrase what we say.

My own mother died a few years ago; did I cry myself to sleep after Mark's keynote because I felt excluded by his choice of words? No. I didn't. Some people really do need to grow up.


“we’ll have less trouble explaining to girls what we actually do" at 35:55

There's another one which excludes me. I'm not single, so I don't spend my time trying to impress girls. Should I have been offended? Of course not.

In this context, I'd usually have said "normal people", meaning non-geeks, rather than "girls"; I tend to be quite self-deprecating about my geek nature.

But when I say "normal people" I often have to then explain what I meant by it. It makes more far sense to say "girls", because then people instantly recognise what I'm trying to say.

So I think it's entirely reasonable that Mark said "girls" in that context. When trying to communicate to a room full of people, of course you communicate in a way which will be understood by all of them without having to go back and explain yourself.

He certainly didn't mean to say "Hey, I think the Linux community is entirely comprised of single (or philandering) males, and lesbians."

If you draw that inference, then you are being bloody stupid!

(I should probably point out that the 'single or philandering' qualification in my above sentence applies to both males and lesbians. I didn't mean to suggest that lesbians aren't capable of a monogamous relationship. Please put the torch down and back away from my front door. But thank you for demonstrating just how stupid some people can be when they're looking for a way to take offence.)


There are problems in the geek community which make it hard for females to join in, and there are real problems with some of the things that people say sometimes. The geek feminist lobby certainly has a point, in the general case.

But Mark's keynote was not an example of this. By throwing their toys out of the pram over Mark's keynote, they cheapen the whole debate and perform a stunning ad hominem on themselves.

If you want to be treated with respect and integrate into the society, you don't achieve that by behaving like a Jemima and kicking up a fuss over nothing. You could try contributing to the real debate, like talking about some of the other crap Mark was spouting in his keynote.

So no, I don't think an apology is a good idea. Unless it's offered by the people who have been making all this stupid fuss — and it's offered both to Mark, and also to the people who really want to promote the integration of women in the community.

Flash storage; a polemic

I originally posted this in response to the LWN coverage of the panel discussion at LinuxCon, but figure I should probably post it somewhere more sensible. So here goes... with a few paragraphs added at the end.


"The flash hardware itself is better placed to know about and handle failures of its cells, so that is likely to be the place where it is done, [Ted] said."

I was biting my tongue when he said that, so I didn't get up and heckle.

I think it's the wrong approach. It was all very well letting "intelligent" drives remap individual sectors underneath us so that we didn't have to worry about bad sectors or C-H-S and interleaving. But what the flash drives have to do to present a "disk" interface is much more than that; it's wrong to think that the same lessons apply here.

What the SSD does internally is a file system all of its own, commonly called a "translation layer". We then end up putting our own file system (ext4, btrfs, etc.) on top of that underlying file system.

Do you want to trust your data to a closed source file system implementation which you can't debug, can't improve and — most scarily — can't even fsck when it goes wrong, because you don't have direct access to the underlying medium?

I don't, certainly. The last two times I tried to install Linux to a SATA SSD, the disk was corrupted by the time I booted into the new system for the first time. The 'black box' model meant that there was no chance to recover — all I could do with the dead devices was throw them away, along with their entire contents.

File systems take a long time to get to maturity. And these translation layers aren't any different. We've been seeing for a long time that they are completely unreliable, although newer models are supposed to be somewhat better. But still, shipping them in a black box with no way for users to fix them or recover lost data is a bad idea.

That's just the reliability angle; there are also efficiency concerns with the filesystem-on-filesystem model. Flash is divided into "eraseblocks" of typically 128KiB or so. And getting larger as devices get larger. You can write in smaller chunks (typically 512 bytes or 2KiB, but also getting larger), but you can't just overwrite things as you desire. Each eraseblock is a bit like an Etch-A-Sketch. Once you've done your drawing, you can't just change bits of it; you have to wipe the whole block.

Our flash will fill up as we use it, and some of the data on the flash will be still relevant. Other parts will have been rendered obsolete; replaced by other data or just deleted files that aren't relevant any more. Before our flash fills up completely, we need to recover some of the space taken by obsolete data. We pick an eraseblock, write out new copies of the data which are still valid, and then we can erase the selected block and re-use it. This process is called garbage collection.

One of the biggest disadvantages of the "pretend to be disk" approach is addressed by the recent TRIM work. The problem was that the disk didn't even know that certain data blocks were obsolete and could just be discarded. So it was faithfully copying those sectors around from eraseblock to eraseblock during its garbage collection, even though the contents of those sectors were not at all relevant — according to the file system, they were free space!

Once TRIM gets deployed for real, that'll help a lot. But there are other ways in which the model is suboptimal.

The ideal case for garbage collection is that we'll find an eraseblock which contains only obsolete data, and in that case we can just erase it without having to copy anything at all. Rather than mixing volatile, short-term data in with the stable, long-term data we actually want to keep them apart, in separate eraseblocks. But in the SSD model, the underlying "disk" can't easily tell which data is which — the real OS file system code can do a much better job.

And when we're doing this garbage collection, it's an ideal time for the OS file system to optimise its storage — to defragment or do whatever else it wants (combining data extents, recompressing, data de-duplication, etc.). It can even play tricks like writing new data out in a suboptimal but fast fashion, and then only optimising it later when it gets garbage collected. But when the "disk" is doing this for us behind our back in its own internal file system, we don't get the opportunity to do so.

I don't think Ted is right that the flash hardware is in the best place to handle "failures of its cells". In the SSD model, the flash hardware doesn't do that anyway — it's done by the file system on the embedded microcontroller sitting next next to the flash.

I am certain that we can do better than that in our own file system code. All we need is a small amount of information from the flash. Telling us about ECC corrections is a first step, of course — when we had to correct a bunch of flipped bits using ECC, it's getting on for time to GC the eraseblock in question, writing out a clean copy of the data elsewhere. And there are technical reasons why we'll also want the flash to be able to say "please can you GC eraseblock #XX soon".

But I see absolutely no reason why we should put up with the "hardware" actually doing that kind of thing for us, behind our back. And badly.

Admittedly, the need to support legacy environments like DOS and to provide INT 13h "DISK BIOS" calls or at least a "block device" driver will never really go away. But that's not a problem. There are plenty of examples of translation layers done in software, where the OS really does have access to the real flash but still presents a block device driver to the OS. Linux has about 5 of them already. The corresponding "dumb" devices (like the M-Systems DiskOnChip which used to be extremely popular) are great for Linux, because we can use real file systems on them directly.

At the very least, we want the "intelligent" SSD devices to have a pass-through mode, so that we can talk directly to the underlying flash medium. That would also allow us to try to recover our data when the internal "file system" screws up, as well as allowing us to do things properly from our own OS file system code.

Now, I'm not suggesting that we already have file system code which can do things better; we don't. I wrote a file system which works on real flash, but I wrote it 8 years ago and it was designed for 16-32MiB of bitbanged NOR flash. We pushed it to work on 1GiB of NAND (and even using DMA!) for OLPC, but that is fairly much the limit of how far we'll get it to scale.

We do, however, have a lot of interesting new work such as UBI and UBIFS, which is rapidly taking the place of JFFS2 in the real world. The btrfs design also lends itself very well to working on real flash, because of the way it doesn't overwrite data in-place. I plan to have btrfs-on-flash, or at least btrfs-on-UBI, working fairly soon.

And, of course, we even have the option of using translation layers in software. That's how I tested the TRIM support when I added it to the kernel; by adding it to our existing flash translation layer implementations. Because when this stuff is done in software, we can work on it and improve it.

So I am entirely confident that we can do much better in software — and especially in open source software — than an SSD could ever do internally.

Let's not be so quick to assume that letting the 'hardware' do it for us is the right thing to do, just because it was right 20 years ago for hard drives do to something which seems vaguely similar at first glance.

Yes, we need the hardware to give us some hints about what's going on, as I mentioned above. But that's about as far as the complexity needs to go; don't listen to the people who tell you that the OS would need to know all kinds of details about the internal geometry of the device, which will be changing from month to month as technology progresses. The basic NAND flash technology hasn't changed that much in the last ten years, and existing file systems which operate on NAND haven't had to make many adjustments to keep up at all.

2009-08-22 01:24:51 +0000 1MefLf-0004Xf-3x H=mailhost8a.rbs.com [155.136.80.166] F=<OnlineBanking@Information.natwest.com> rejected after DATA: Your message lacks a Date: header, which RFC5322 says it MUST have.
Dear Mr. Woodhouse,

Thank you for your call of 26th August about not being able to accept notification emails.
...
I have investigated the matter and can confirm that the statement notification emails are sent out with the date on. The rfc5322 is an internet protocol only and we do not have to abide by this.

Our records show that the notification emails failed delivery on the 21st August due to an invalid email address. I hope this is a satisfactory resolution to your complaint.


Christ, where do I start with this? Yes, if you're claiming to be sending Internet email then you really do have to follow RFC5322. That's the standard that defines what Internet email is.

But that seems to be a red herring — he also claims that they are including a Date: header. Unfortunately, he's wrong. He's probably looking at an email which had the Date: header added in transit by the recipient's mail server. That would be obvious to anyone with a clue, because you can compare the datestamps in the Received: headers and observe that it matches one of the later ones, not the first.

And his diagnosis of the reason for the failure seems to be complete nonsense too, given that the SMTP rejection notice contained precisely the above text: "Your message lacks a Date: header, which RFC5322 says it MUST have.".

Well done, Nat West. Bonus points for stupidity today.

Remember last year when British Telecom kept closing fault tickets without actually fixing the fault or reading what we'd told them? Well, it's official — It is BT policy to ignore all information provided in a fault ticket. They admitted it:

"CRM Teams and customers have also been advised that the only action taken on these 'Amend requests' is to complete them to allow the fault to progress. CS Ops do not actively respond to any information on these requests."

Their current game is attempting to charge me £128,000 for installing a new phone line. That's apparently the full cost of upgrading the line plant into the village, which has been desperately needed for a long time but although they costed it up years ago, they haven't got round to doing it yet. Perhaps they were just waiting for a single individual consumer to pay for it?

209 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!