Older blog entries for dwmw2 (starting at number 212)

13 Oct 2009 »

"If the offence was unintended, an apology should be cheap."

An apology is cheap, it's true — but it's also counterproductive, because it reinforces the false belief that such an apology was necessary or appropriate.

Pandering to these people just contributes to the utterly idiotic culture of political correctness which blights our society.

Let's take a look at what he actually said, for crying out loud...

“A release is an amazing thing; I’m not talking about the happy ending..”: 3:02

It's crude, but I don't see it as being sexist. The terms 'release' and 'happy ending' could just as well be used to describe the female experience as the male experience, although ladies are less inclined to make such reference to it in public. It's not excluding women; it's excluding prudes.

“Your printer, and your mom’s printer, and your grandma’s printer”: 35:30

Oh, for crying out loud. Would it really have made that much difference if he'd said 'dad and grandma', or 'mom and grandpa'? No, it wouldn't. Some people must have been trying really hard to find something to take offence at.

Of course your mum is likely to be less technical than your dad. That's just the way the world is. Does your mum complain when she gets cheaper car insurance? Men and women are different, and we shouldn't be burned at the stake if that fundamental fact of life affects the minor details of how we phrase what we say.

My own mother died a few years ago; did I cry myself to sleep after Mark's keynote because I felt excluded by his choice of words? No. I didn't. Some people really do need to grow up.

“we’ll have less trouble explaining to girls what we actually do" at 35:55

There's another one which excludes me. I'm not single, so I don't spend my time trying to impress girls. Should I have been offended? Of course not.

In this context, I'd usually have said "normal people", meaning non-geeks, rather than "girls"; I tend to be quite self-deprecating about my geek nature.

But when I say "normal people" I often have to then explain what I meant by it. It makes more far sense to say "girls", because then people instantly recognise what I'm trying to say.

So I think it's entirely reasonable that Mark said "girls" in that context. When trying to communicate to a room full of people, of course you communicate in a way which will be understood by all of them without having to go back and explain yourself.

He certainly didn't mean to say "Hey, I think the Linux community is entirely comprised of single (or philandering) males, and lesbians."

If you draw that inference, then you are being bloody stupid!

_{(I should probably point out that the 'single or
philandering' qualification in my above sentence applies to
both males and lesbians. I didn't mean to suggest that
lesbians aren't capable of a monogamous relationship. Please
put the torch down and back away from my front
door. But thank you for demonstrating just how stupid some
people can be when they're looking for a way to
take offence.)}

There are problems in the geek community which make it hard for females to join in, and there are real problems with some of the things that people say sometimes. The geek feminist lobby certainly has a point, in the general case.

But Mark's keynote was not an example of this. By throwing their toys out of the pram over Mark's keynote, they cheapen the whole debate and perform a stunning ad hominem on themselves.

If you want to be treated with respect and integrate into the society, you don't achieve that by behaving like a Jemima and kicking up a fuss over nothing. You could try contributing to the real debate, like talking about some of the other crap Mark was spouting in his keynote.

So no, I don't think an apology is a good idea. Unless it's offered by the people who have been making all this stupid fuss — and it's offered both to Mark, and also to the people who really want to promote the integration of women in the community.

4 Oct 2009 »

Flash storage; a polemic

I originally posted this in response to the LWN coverage of the panel discussion at LinuxCon, but figure I should probably post it somewhere more sensible. So here goes... with a few paragraphs added at the end.

"The flash hardware itself is better placed to know about and handle failures of its cells, so that is likely to be the place where it is done, [Ted] said."

I was biting my tongue when he said that, so I didn't get up and heckle.

I think it's the wrong approach. It was all very well letting "intelligent" drives remap individual sectors underneath us so that we didn't have to worry about bad sectors or C-H-S and interleaving. But what the flash drives have to do to present a "disk" interface is much more than that; it's wrong to think that the same lessons apply here.

What the SSD does internally is a file system all of its own, commonly called a "translation layer". We then end up putting our own file system (ext4, btrfs, etc.) on top of that underlying file system.

Do you want to trust your data to a closed source file system implementation which you can't debug, can't improve and — most scarily — can't even fsck when it goes wrong, because you don't have direct access to the underlying medium?

I don't, certainly. The last two times I tried to install Linux to a SATA SSD, the disk was corrupted by the time I booted into the new system for the first time. The 'black box' model meant that there was no chance to recover — all I could do with the dead devices was throw them away, along with their entire contents.

File systems take a long time to get to maturity. And these translation layers aren't any different. We've been seeing for a long time that they are completely unreliable, although newer models are supposed to be somewhat better. But still, shipping them in a black box with no way for users to fix them or recover lost data is a bad idea.

That's just the reliability angle; there are also efficiency concerns with the filesystem-on-filesystem model. Flash is divided into "eraseblocks" of typically 128KiB or so. And getting larger as devices get larger. You can write in smaller chunks (typically 512 bytes or 2KiB, but also getting larger), but you can't just overwrite things as you desire. Each eraseblock is a bit like an Etch-A-Sketch. Once you've done your drawing, you can't just change bits of it; you have to wipe the whole block.

Our flash will fill up as we use it, and some of the data on the flash will be still relevant. Other parts will have been rendered obsolete; replaced by other data or just deleted files that aren't relevant any more. Before our flash fills up completely, we need to recover some of the space taken by obsolete data. We pick an eraseblock, write out new copies of the data which are still valid, and then we can erase the selected block and re-use it. This process is called garbage collection.

One of the biggest disadvantages of the "pretend to be disk" approach is addressed by the recent TRIM work. The problem was that the disk didn't even know that certain data blocks were obsolete and could just be discarded. So it was faithfully copying those sectors around from eraseblock to eraseblock during its garbage collection, even though the contents of those sectors were not at all relevant — according to the file system, they were free space!

Once TRIM gets deployed for real, that'll help a lot. But there are other ways in which the model is suboptimal.

The ideal case for garbage collection is that we'll find an eraseblock which contains only obsolete data, and in that case we can just erase it without having to copy anything at all. Rather than mixing volatile, short-term data in with the stable, long-term data we actually want to keep them apart, in separate eraseblocks. But in the SSD model, the underlying "disk" can't easily tell which data is which — the real OS file system code can do a much better job.

And when we're doing this garbage collection, it's an ideal time for the OS file system to optimise its storage — to defragment or do whatever else it wants (combining data extents, recompressing, data de-duplication, etc.). It can even play tricks like writing new data out in a suboptimal but fast fashion, and then only optimising it later when it gets garbage collected. But when the "disk" is doing this for us behind our back in its own internal file system, we don't get the opportunity to do so.

I don't think Ted is right that the flash hardware is in the best place to handle "failures of its cells". In the SSD model, the flash hardware doesn't do that anyway — it's done by the file system on the embedded microcontroller sitting next next to the flash.

I am certain that we can do better than that in our own file system code. All we need is a small amount of information from the flash. Telling us about ECC corrections is a first step, of course — when we had to correct a bunch of flipped bits using ECC, it's getting on for time to GC the eraseblock in question, writing out a clean copy of the data elsewhere. And there are technical reasons why we'll also want the flash to be able to say "please can you GC eraseblock #XX soon".

But I see absolutely no reason why we should put up with the "hardware" actually doing that kind of thing for us, behind our back. And badly.

Admittedly, the need to support legacy environments like DOS and to provide INT 13h "DISK BIOS" calls or at least a "block device" driver will never really go away. But that's not a problem. There are plenty of examples of translation layers done in software, where the OS really does have access to the real flash but still presents a block device driver to the OS. Linux has about 5 of them already. The corresponding "dumb" devices (like the M-Systems DiskOnChip which used to be extremely popular) are great for Linux, because we can use real file systems on them directly.

At the very least, we want the "intelligent" SSD devices to have a pass-through mode, so that we can talk directly to the underlying flash medium. That would also allow us to try to recover our data when the internal "file system" screws up, as well as allowing us to do things properly from our own OS file system code.

Now, I'm not suggesting that we already have file system code which can do things better; we don't. I wrote a file system which works on real flash, but I wrote it 8 years ago and it was designed for 16-32MiB of bitbanged NOR flash. We pushed it to work on 1GiB of NAND (and even using DMA!) for OLPC, but that is fairly much the limit of how far we'll get it to scale.

We do, however, have a lot of interesting new work such as UBI and UBIFS, which is rapidly taking the place of JFFS2 in the real world. The btrfs design also lends itself very well to working on real flash, because of the way it doesn't overwrite data in-place. I plan to have btrfs-on-flash, or at least btrfs-on-UBI, working fairly soon.

And, of course, we even have the option of using translation layers in software. That's how I tested the TRIM support when I added it to the kernel; by adding it to our existing flash translation layer implementations. Because when this stuff is done in software, we can work on it and improve it.

So I am entirely confident that we can do much better in software — and especially in open source software — than an SSD could ever do internally.

Let's not be so quick to assume that letting the 'hardware' do it for us is the right thing to do, just because it was right 20 years ago for hard drives do to something which seems vaguely similar at first glance.

Yes, we need the hardware to give us some hints about what's going on, as I mentioned above. But that's about as far as the complexity needs to go; don't listen to the people who tell you that the OS would need to know all kinds of details about the internal geometry of the device, which will be changing from month to month as technology progresses. The basic NAND flash technology hasn't changed that much in the last ten years, and existing file systems which operate on NAND haven't had to make many adjustments to keep up at all.

29 Aug 2009 »

2009-08-22 01:24:51 +0000 1MefLf-0004Xf-3x H=mailhost8a.rbs.com [155.136.80.166] F=<OnlineBanking@Information.natwest.com> rejected after DATA: Your message lacks a Date: header, which RFC5322 says it MUST have.

Dear Mr. Woodhouse,
Thank you for your call of 26th August about not being able to accept notification emails.
...
I have investigated the matter and can confirm that the statement notification emails are sent out with the date on. The rfc5322 is an internet protocol only and we do not have to abide by this.
Our records show that the notification emails failed delivery on the 21st August due to an invalid email address. I hope this is a satisfactory resolution to your complaint.

Christ, where do I start with this? Yes, if you're claiming to be sending Internet email then you really do have to follow RFC5322. That's the standard that defines what Internet email is.

But that seems to be a red herring — he also claims that they are including a Date: header. Unfortunately, he's wrong. He's probably looking at an email which had the Date: header added in transit by the recipient's mail server. That would be obvious to anyone with a clue, because you can compare the datestamps in the Received: headers and observe that it matches one of the later ones, not the first.

And his diagnosis of the reason for the failure seems to be complete nonsense too, given that the SMTP rejection notice contained precisely the above text: "Your message lacks a Date: header, which RFC5322 says it MUST have.".

Well done, Nat West. Bonus points for stupidity today.

22 Aug 2009 »

Remember last year when British Telecom kept closing fault tickets without actually fixing the fault or reading what we'd told them? Well, it's official — It is BT policy to ignore all information provided in a fault ticket. They admitted it:

"CRM Teams and customers have also been advised that the only action taken on these 'Amend requests' is to complete them to allow the fault to progress. CS Ops do not actively respond to any information on these requests."

Their current game is attempting to charge me £128,000 for installing a new phone line. That's apparently the full cost of upgrading the line plant into the village, which has been desperately needed for a long time but although they costed it up years ago, they haven't got round to doing it yet. Perhaps they were just waiting for a single individual consumer to pay for it?

31 Jul 2009 »

Hahaha. Skype might have to shut down due to licensing problems.

I hope it does. Random crap using non-standard protocols and non-free software deserves to die — and the sheep who used it deserve what they get too.

8 Jul 2009 »

I'm accustomed to technical support being fairly incompetent and clueless, but Acer seem to have taken it to a new level. They have taken to telling direct lies and seem to be attempting to defraud their customers.

I don't think I'll ever be buying Acer hardware again.

I bought an Acer laptop a couple of months ago, through Misco. I phoned Misco and tried to get them to ship it to me without the preinstalled Windows Vista operating system. They said that it was not possible.

At that point I should have taken my business elsewhere, but this was quite a good deal — ISTR it was a return, or something like that, so it was quite cheap. So I ordered the laptop anyway, and then when it arrived I declined to accept the End User Licensing Agreement, installed Linux on it and contacted Acer for my refund as indicated.

Acer's first response was that they would be able to refund the £20.30 that Windows Vista was worth, but that they "will require a £51.99 payment to have the machine brought in to the repair centre so we may remove this for you. This will cover the courier and engineer's labour fee."

This seems to be an obvious scam to prevent customers from obtaining the refund to which they are entitled, and I didn't accept it. I wrote a letter to their head office, returning the Windows serial number sticker and giving photographic evidence that Linux had been installed on the system, wiping the old operating system. And demanding my refund within one month or court proceedings would be issued.

Acer responded to this, retracting the demand for a £51.99 payment but still claiming that the laptop had to be shipped back to them at my expense. They said that they needed to "action the following:

Validate that the Operating System has been removed from the Hard Disk.
Remove the Microsoft COA (Certificate Of Authenticity) label
Verify your proof of purchase to ascertain that you are in the specified timeframe to refund this product.
To verify if any back up recovery disks have been made and if so, recovered from you.
A signed form from you, which may be given to Microsoft and which agrees to hold Acer harmelss from any claims by third parties in the event that you have produced any false information on the request."

I pointed out that it was not necessary for them to have the system shipped back to them to achieve their requirements. I offered them remote access to the system in order to verify that there was no trace of Windows left on the hard drive, and asked for a copy of the form they mentioned. I also gave them a copy of my proof of purchase, reminded them that I'd already sent back the sticker, and stated that I had made no backup copies.

At this point, they went silent and stopped responding to my email — even when I reminded them that the deadline was approaching and I was about to file the court claim for my refund. They did eventually start responding again after about two months, when I informed them that I had finally got round to filing the court case.

This did seem to get their attention, but they still claimed that they needed the system to be shipped back to them. When I spoke to an engineer on the telephone, he claimed that it wasn't sufficient merely to check that the hard drive had been wiped, and compare the serial number reported by its firmware with the one in their records. He said they had to actually take the laptop apart and read the serial number from the label on the hard drive, because I might have put a different hard drive into the laptop and flashed its firmware so that it pretended to have the same serial number as the original.

I pointed out that this was somewhat far-fetched, and if I was so inclined it would be much easier for me to just copy Windows off the original hard drive, send it back to them for validation, then put it back again afterwards. He agreed, but said that their agreement with Microsoft was that they must verify that the OS had been removed from the original hard drive — what happened after that wasn't their problem.

At this point, with the court proceedings already filed, they agreed to pay for the courier (and the court costs). Since it would only take a few days, I conceded. Before shipping it off to them, however, I took a screwdriver and carefully aligned all the screws so that I could tell if it had been opened.

Imagine my surprise when it came back and they hadn't opened the case! Despite all their protestation that they needed physical access, and that they had to open the case and physically read the serial numbers from the hard drive, when they finally got the opportunity to do so they didn't bother.

All they did was check the partitioning and serial number through software — which they could have done months ago, remotely.

As far as I can tell, it's just a huge scam to prevent customers from claiming the refund for the unlawfully-bundled software, by making it cost more to do so than they get in the refund. I certainly would have given up a long time ago if it wasn't for the principle of the thing.

Now it seems entirely clear that Acer are simply attempting to defraud their customers, though, I shall be reporting it to Trading Standards to see what they have to say about the matter.

6 Jul 2009 »

Software makes me sad sometimes.

Every time the iwlagn driver crashes and has to be reloaded (and it does that distressingly often, since it doesn't seem to reset the device and recover when its closed-source firmware crashes), NetworkManager kills the connection and restarts completely. Not unreasonably, I suppose.

But then, all NFS mounts get automatically unmounted, which is a complete pain in the arse.

And my VPN connection is reset, and because Cisco are stupid I don't get the same VPN IP address next time I connect, even if it is still available. (I think I ought to be able to work around this from the client side, if I don't mind storing the authentication cookie on the client machine.).

Although having said that, the main reason I'd want my IP address to remain the same is so that my connection to the mail server can persist and I don't have to wait through Evolution's painfully slow startup.

Unfortunately, Evolution also responds to the network offline/online events by reporting -EAGAIN errors all the time when it auto-saves emails that you're composing, and stops being able to display mail folders — the index just comes up empty. So it needs to be killed and restarted too. (This has been in bugzilla since November last year).

8 Jun 2009 »

Software makes me sad sometimes.

Q: My application has a command-line option to use an SSL client certificate. What is the OpenSSL function to load and use the certificate from a file?

A: Well, we make this lots of fun for you — it would be boring if there was just one function which you could pass the filename to. You have to write 230 lines of code like this instead.... First you have to check for yourself what type of file it is — is it a PKCS#12 file, is it a PEM file with a key in it, or is it a TPM key 'blob'?

No, there's no function which determines that for you — you have to do it yourself. And depending on the answer, you have to do three entirely different things to load the key.

To make things even more fun, those three file types have wildly different ways to handle their passphrase/PIN:

For a PEM file, you can't tell OpenSSL the passphrase in advance — if the user gave it on the command line, you have to manually override the user interface function that OpenSSL will call, and make your replacement function return the pre-set passphrase. Or if you do ask the user, you've got no way to easily tell whether the user got the passphrase wrong; if they get it wrong (and type 4 or more characters) then the 'load key' function will fail and you have to compare against a special error code, which may differ from version to version of OpenSSL because it has internal function names. Just for variety, if the user enters a wrong passphrase with fewer than 4 characters, they'll get no feedback and will just be asked again immediately.
For a PKCS#12 file, it's the other way round — you have to give the passphrase in advance, so you have to ask the user for it yourself. Even if the file isn't actually encrypted — because you don't know that yet.
For a TPM key it's a bit saner — you can either set the PIN in advance or otherwise OpenSSL will ask the user for it if necessary. But you do have to jump through various other hoops to use the TPM 'engine', instead of just pointing OpenSSL at the file and having everything handled for you.

Excuse me while I bash my head against a brick wall for a while...

And no, the answer is not "don't use OpenSSL then".

At least, not until one of the potential replacements actually starts to catch up with the features I need — support for using a TPM for certificates, and DTLS support.

22 May 2009 »

WTF? Case-sensitive, but not case-preserving...

27 Apr 2009 »

Why are people so bloody clueless about email? I received this in snail mail from my bank today:

Account Number xxxxxxxx Sort Code xx-xx-xx
Your statement
Your statement for the above account, is ready to view by logging in to online banking at www.natwest.com.
Unfortunately, we have been unable to deliver this alert to you by email. This may be because the email address we hold for you (DAVID@WOODHOU.SE) is incorrect.

That has to be almost the most clueless bug report I've ever seen. It should have included at least some of:

Precise date and time of the latest delivery attempt
Sender's email address
Sending server IP address
Which MX host was being delivered to
The rejection message from the MX host

If I hadn't been running my own mail server, I'd have had no way to work out what happened — no ISP is going to go trawling through their logs looking for a needle in a haystack based on virtually nothing.

Since I do run my own, I was able to log into all the MX hosts for that domain, look through the historical mail logs on each of them and I happened to find their failed message among all the lots of other people trying to fake mail from NatWest:

2009-04-21 00:38:20 +0000 1Lw40C-0002sE-3D H=mailhost7a.rbs.com [155.136.80.121] F=<OnlineBanking@Information.natwest.com> rejected after DATA: Your message lacks a Date: header, which RFC5322 says it MUST have.

Upon calling them to tell them of their problem, I was asked "who says our mails lack a Date: header?" and "who says that they should?".

After dealing with that, I left the first-line support person with three items to pass on to Nat West's technical team:

The lack of Date: header on their outbound mail
The uselessness of the letter they send when they can't deliver email
The fact that they are converting email addresses to upper case, when localparts may well be case-sensitive

I wonder what the odds are of any of them actually getting fixed?

Maybe I should have added "you're sending outbound mail without GPG-signing it" as a fourth item? :)

203 older entries...