Older blog entries for etbe (starting at number 953)

USB Flash Storage

For some years I have had my Internet gateway/firewall system in a cupboard in my bedroom. While I don’t mind some computer noise (I’ve slept near a server for most of the last 22 years) it’s good to have it as quiet as possible so getting rid of the hard drive is desirable.

I considered buying an IDE flash drive, but I’d like to continue my trend of not paying for hardware so I chose to use some USB flash devices that HP was giving away at a seminar (thanks HP – as an aside the P3 system is an old Compaq Desktop system). So I’ve got one 4G USB device for root and one for Squid.

For the past few months I’ve had /var/spool/squid be a USB flash device. I considered using RAID-0 for that filesystem because the computer is a P3 and only has USB 1.2 and thus a maximum theoretical transfer rate of 1.5MB/s and a maximum real-world rate of about 1MB/s. But my ADSL connection doesn’t seem able to sustain much more than 1MB/s and Squid doesn’t write data synchronously so in all my tests the USB speed hasn’t affected HTTP performance.

One issue that has delayed my move to all USB is the difficulty of booting as the P3 system in question doesn’t support booting from USB. I considered creating a boot CD that loads the kernel from the USB device, but that seemed a little painful and also relies on the CD-ROM drive working – which isn’t a great idea for a system that runs 24*7 in a dusty cupboard. I ended up using GRUB on the IDE hard drive to load the kernel and initrd and then mount a USB device as root, this seems to work and the command “hdparm -S6 /dev/sda” in /etc/rc.local makes the hard drive go to sleep once the system is booted.

The only technical parts of the process were putting in the UUIDs of the filesystems in /etc/fstab (because I can’t be sure which USB device will be found first) and creating a new initramfs with modules for USB storage listed in /etc/initramfs-tools/modules so that a USB device could be the root filesystem.

The firewall system is now a bit quieter and based on my last tests of hard drive power use will probably dissipate about 5-7W less heat. The next thing to do is wait and see if it keeps running or falls over. ;)

Related posts:

  1. flash for main storage I was in a discussion about flash on a closed...
  2. Flash Storage and Servers In the comments on my post about the Dell PowerEdge...
  3. IDE DMA and Flash I’ve just been working with a Flash device used as...

Syndicated 2012-03-11 11:58:27 from etbe - Russell Cokeretbe - Russell Coker

SE Linux Status in Debian 2012-03

I have just finished updating the user-space SE Linux code in Debian/Unstable to the version released on 2012-02-16. There were some changes to the build system from upstream which combined with the new Debian multi-arch support involved a fair bit of work for me. While I was at it I converted more of them to the new Quilt format to make it easier to send patches upstream. In the past I have been a bit slack about sending patches upstream, my aim for the next upstream release of user-space is to have at least half of my patches included upstream – this will make things easier for everyone.

Recently Mika Pflüger and Laurent Bigonville have started work on Debian SE Linux, they have done some good work converting the refpolicy source (which is used to build selinux-policy-default) to Quilt. Now it will be a lot easier to send policy patches upstream and porting them to newer versions of the upstream refpolicy.

Now the next significant thing that I want to do is to get systemd working correctly with SE Linux. But first I have to get it working correctly wit cryptsetup.

Related posts:

  1. SE Linux Status in Debian 2011-10 Debian/Unstable Development deb http://www.coker.com.au wheezy selinux The above APT sources.list...
  2. SE Linux Status in Debian 2012-01 Since my last SE Linux in Debian status report [1]...
  3. Debian SE Linux Status At the moment I’ve got more time to work on...

Syndicated 2012-03-06 06:44:33 from etbe - Russell Cokeretbe - Russell Coker

LUV Hardware Library

What is it?

Last month I started what I am calling the LUV Hardware Library. It’s a briefcase full of computer parts that are free to LUV members which I plan to bring to all LUV meetings [1]. The issue is that there is a lot of hardware which has no great value to people and is often excess to requirements but which is still somewhat expensive or difficult to obtain and thus transferring it from the people who don’t need it to people who do provides a significant benefit to the recipient but no real cost to the donor.

Currently my briefcase has a range of different types of DIMM, some PCI cards, some assorted cables, a laptop SATA disk, and a bunch of other random things. Most of the stuff has been in my spare parts pile for a year or two, but some of it has been donated by other people.

What we Need

The next thing we need is more packaging, anti-static bags for RAM and PCI cards, sealable bags for screws, and some way of storing motherboard batteries.

In terms of hardware that can be donated one thing that would be really useful is SATA disks. I’m sure that there are lots of people who have upgraded the storage of their computer and kept the old drive with no real chance of using it. But there are lots of people who need such disks, for example I’ve been giving away a bunch of P4 systems which lack disks and are not cabled for IDE disks – a supply of SATA disks would make them a lot more usable.

Also any other random electronic stuff is of interest, including non-Linux things such as mobile phones (but only if not network locked).

If you have something that’s big or heavy to donate then contact me via email first.

Other Options

Computerbank does great work in rebuilding old PCs and selling them cheaply to worthy people [2], but most of the spare hardware I get is below the minimum specs that they will accept. I’m not planning to compete with Computerbank in any way, I just want to provide a useful service to LUV members who want to upgrade their PC for free.

I encourage other people to do the same at other LUG meetings!

Related posts:

  1. Donating old Hardware On a recent visit to my local e-waste disposal place...
  2. Giving Away Hardware For the last few years I have been actively seeking...
  3. Shelf-life of Hardware Recently I’ve been having some problems with hardware dying. Having...

Syndicated 2012-03-05 10:33:13 from etbe - Russell Coker

Links February 2012

Sociological Images has an interesting article about the attempts to apply the word “Camping” to OWS and framing the issues [1].

Lester Macgurdy wrote an insightful article about “the snake”, a new technique for OWS protesters to beat riot police [2].

Ron Barassi suggests that “Australia Day” be celebrated on the 27th of May to commemorate the day in 1967 when the Australian constitution was amended to not be racist [3]. The current “Australia Day” is often referred to as “Invasion Day”. IMHO Ron deserves another “Best and Fairest” award.

Stefon Harris gave an entertaining TED talk about improv Jazz music titled “There Are No Mistakes on the Bandstand” [4]. It seems that his concepts can apply to some extent to many collaborative projects.

John Robb wrote an interesting article about the future of drone (UAV) warfare [5]. He suggests that having one person control each drone is a temporary thing and that the future is to have a cloud of cheap autonomous drones taking strategic control from one person. His comparison of Starcraft players to future drone fighters is interesting.

The OWS movement is branching out into other related areas, OccupyYourHomes.org is one of the latest ones [6]. When banks try to forclose on homes without good cause the OWS people are protesting.

Cory Doctorow wrote an important article for The Guardian about corporations using the Youtube ContentID system to pirate works that other people have uploaded [7].

Matt Taibbi’s description of Goldman Sachs as “a great vampire squid wrapped around the face of humanity, relentlessly jamming its blood funnel into anything that smells like money” will never die [8]. It has spawned many other creative descriptions of the evil and greed of Goldman Sachs and even Lloyd Blankfein of Goldman Sachs describes his company as having “burned down the Reichstag, shot the Archduke Ferdinand and fired on Fort Sumter” – he was trying to use satire, but I don’t think that Goldman Sachs people would act differently to Fritz Thyssen.

Keith Packard wrote an interesting article about the Calypso CalDAV system which he uses with Android [9]. He makes lots of good points about how to improve calendaring and contacts on Android, unfortunately I lack time to fiddle with such things at the moment so I’ll stick with Google in spite of the risks.

Asheesh Laroia wrote a great article about the problems with short (32bit) GPG keys [10]. It seems that creating keys with matching ID numbers isn’t particularly difficult and that GPG doesn’t handle them as well as we would like giving the possibility of at best annoying DoS attacks and at worse security problems due to using the wrong key.

Sociological Images has an interesting article about when game show audiences are trustworthy [11]. It seems that French people don’t want an undeserving person to win so they will intentionally advocate the wrong answer if the contestant should know it.

Paul Wayper gave a great lecture titled “SE Linux for Everyone” [12]. He covers the basics of SE Linux in a user-friendly way and explains some simple solutions to common problems which don’t involve compromising system security.

Paul Tassi wrote an insightful article for Forbes about piracy [13]. His conclusion is that the media companies should make it cheaper and easier to be a customer and not spend insane amounts of money on low quality products.

The Reid Report has an interesting article about Ron Paul’s racism [14]. Ron Paul is generally well regarded outside the US because he wants the US government to stop meddling in the affairs of other countries, but while he’s less bad than other US politicians in terms of foreign policy that doesn’t make him a good person.

Anonymous hacked some mailboxes belonging to a neo-Nazi group and found links to Ron Paul [15]. I’ve always been suspicious of the way Ron Paul wanted to avoid anti-racism legislation on supposed Libertarian principles.

The Reid Report has an interesting summary of Ron Paul news plus some criticism of Glenn Greenwald and others who associate with him [16].

Related posts:

  1. Links February 2011 Australia’s Department of Finance has mandated that the MS-Office document...
  2. Links January 2012 Cops in Tennessee routinely steal cash from citizens [1]. They...
  3. Links February 2009 Michael Anissimov writes about the theft of computers from the...

Syndicated 2012-02-15 15:19:51 from etbe - Russell Coker

Cooling a Thinkpad

Late last year I wrote about the way that modern laptops suck [1]. One of the problems that inspired that post was the excessive heat generated by my Thinkpad T61.

There is a partial solution to this, Fool Control explains how the kernel option pcie_aspm=force can be used on kernels from 2.6.38 onwards to solve a heat regression problem [2]. I applied this to my Thinkpad T61 and the result was that on a cool evening (ambient temperature about 24C) the temperature changed from 85C to 66C on the NVidia video card, and for the “virtual devices” it changed from 80C and 78C to 60C and 61C. I’m not sure exactly what each of those measurements refers to, but it seems that the change was somewhere between 17C and 20C.

This changes the system from being almost unbearable to use to being merely annoyingly warm.

I’m not going to make my laptop be my primary computing device again though, the combination of a desktop system with a 27″ monitor and an Android phone is working quite well for me [3]. But I haven’t yet got version control systems working for all my software. Also Wouter suggested using NBD which is something I haven’t got working yet and probably won’t until I can swap on it and therefore have a diskless workstation. Finally I still haven’t got the “Chrome to Phone” browser extension working such that a page I’m viewing at home can be loaded on my phone.

Related posts:

  1. Taking my Thinkpad Apart and Cooling Problems I’ve been having some cooling problems with my Thinkpad recently....
  2. thinkpad back from repair On Tuesday my Thinkpad was taken for service to fix...
  3. I Just Bought a new Thinkpad and the Lenovo Web Site Sucks I’ve just bought a Thinkpad T61 at auction for $AU796....

Syndicated 2012-02-13 04:14:06 from etbe - Russell Coker

Magic entries for BTRFS and Software RAID

I’ve just discovered that the magic database for the file(1) command in Debian/Unstable has no support for Linux Software RAID and that it’s support for BTRFS is lacking (no reporting of space used, number of devices, or the UUID). Below is my first draft of a change to fix these problems. I would appreciate it if someone with a big-endian system could test these out and let me know how they go, I suspect that I will have to change the “lelong” types to “long” but I’m not sure.

4096 lelong 0xa92b4efc Linux Software RAID
>4100 lelong x version 1.2 (%d)
>4112 belong x UUID=%8x:
>4116 belong x \b%8x:
>4120 belong x \b%8x:
>4124 belong x \b%8x
>4128 string x name=%s
>4168 lelong x level=%d
>4188 lelong x disks=%d

0 lelong 0xa92b4efc Linux Software RAID
>4 lelong x version 1.1 (%d)
>16 belong x UUID=%8x:
>20 belong x \b%8x:
>24 belong x \b%8x:
>28 belong x \b%8x
>32 string x name=%s
>72 lelong x level=%d
>92 lelong x disks=%d

# BTRFS
0×10040 string _BHRfS_M BTRFS Filesystem
>0x1012b string >\0 label "%s",
>0×10090 lelong x sectorsize %d,
>0×10094 lelong x nodesize %d,
>0×10098 lelong x leafsize %d,
>0×10020 belong x UUID=%8x-
>0×10024 beshort x \b%4x-
>0×10026 beshort x \b%4x-
>0×10028 beshort x \b%4x-
>0x1002a beshort x \b%4x
>0x1002c belong x \b%8x,
>0×10078 lequad x %lld/
>0×10070 lequad x \b%lld bytes used,
>0×10088 lequad x %lld devices

Related posts:

  1. Label vs UUID vs Device Someone asked on a mailing list about the issues related...
  2. Starting with BTRFS Based on my investigation of RAID reliability [1] I have...
  3. Software vs Hardware RAID Should you use software or hardware RAID? Many people claim...

Syndicated 2012-02-11 07:21:07 from etbe - Russell Coker

Starting with BTRFS

Based on my investigation of RAID reliability [1] I have determined that BTRFS [2] is the Linux storage technology that has the best potential to increase data integrity without costing a lot of money. Basically a BTRFS internal RAID-1 should offer equal or greater data protection than RAID-6.

As BTRFS is so important and so very different to any prior technology for Linux it’s not something that can be easily deployed in the same way as other filesystems. It is possible to easily switch between filesystems such as Ext4 and XFS because they work in much the same way, you have a single block device which the filesystem uses to create a single mount-point. While BTRFS supports internal RAID so it may have multiple block devices and it may offer multiple mountable filesystems and snapshots. Much of the functionality of Linux Software RAID and LVM is covered by BTRFS. So the sensible way to deploy BTRFS is to give it all your storage and not make use of any other RAID or LVM.

So I decided to do a test installation. I started with a Debian install CD that was made shortly before the release of Squeeze (it was first to hand) and installed with BTRFS for the root filesystem, I then upgraded to Debian/Unstable to get the latest kernel as BTRFS is developing rapidly. The system failed on the first boot after upgrading to Unstable because the /etc/fstab entry for the root filesystem had the FSCK pass number set to 1 – which wasn’t going to work as no FSCK program has been written. I changed that number to 0 and it then worked.

The initial install was on a desktop system that had a single IDE drive and a CD-ROM drive. For /boot I used a degraded RAID-1 and then after completing the installation I removed the CD-ROM drive and installed a second hard drive, after that it was easy to add the other device to the RAID-1. Then I tried to add a new device to the BTRFS group with the command “btrfs device add /dev/sdb2 /dev/sda2” and was informed that it can’t do that to a mounted filesystem! That will decrease the possibilities for using BTRFS on systems with hot-swap drives, I hope that the developers regard it as a bug.

Then I booted with an ext3 filesystem for root and tried the “btrfs device add /dev/sdb2 /dev/sda2” again but got the error message “btrfs: sending ioctl 5000940a to a partition!” which is not even found by Google.

The next thing that I wanted to do was to put a swap file on BTRFS, the benefits for having redundancy and checksums on swap space seem obvious – and other BTRFS features such as compression might give a benefit too. So I created a file by using dd to take take from /dev/zero, ran mkswap on it and then tried to run swapon. But I was told that the file has holes and can’t be used. Automatically making zero blocks into holes is a useful feature in many situations, but not in this case.

So far my experience with BTRFS is that all the basic things work (IE storing files, directories, etc). But the advanced functions I wanted from BTRFS (mirroring and making a reliable swap space) failed. This is a bit disappointing, but BTRFS isn’t described as being ready for production yet.

Related posts:

  1. Discovering OS Bugs and Using Snapshots I’m running Debian/Unstable on an EeePC 701, I’ve got an...
  2. Reliability of RAID ZDNet has an insightful article by Robin Harris predicting the...
  3. How I Partition Disks Having had a number of hard drives fail over the...

Syndicated 2012-02-10 09:52:16 from etbe - Russell Coker

More DRBD Performance tests

I’ve previously written Some Notes on DRBD [1] and a post about DRBD Benchmarking [2].

Previously I had determined that replication protocol C gives the best performance for DRBD, that the batch-time parameters for Ext4 aren’t worth touching for a single IDE disk, that barrier=0 gives a massive performance boost, and that DRBD gives a significant performance hit even when the secondary is not connected. Below are the results of some more tests of delivering mail from my Postal benchmark to my LMTP server which uses the Dovecot delivery agent to write it to disk, the rates are in messages per minute where each message is an average of 70K in size. The ext4 filesystem is used for all tests and the filesystem features list is “has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize“.

p4-2.8
Default Ext4 1663
barrier=0 2875
DRBD no secondary al-extents=7 645
DRBD no secondary default 2409
DRBD no secondary al-extents=1024 2513
DRBD no secondary al-extents=3389 2650
DRBD connected 1575
DRBD connected al-extents=1024 1560
DRBD connected al-extents=1024 Gig-E 1544

The al-extents option determines the size of the dirty areas that need to be resynced when a failed node rejoins the cluster. The default is 127 extents of 4M each for a block size of 508MB to be synchronised. The maximum is 3389 for a synchronisation block size of just over 13G. Even with fast disks and gigabit Ethernet it’s going to take a while to synchronise things if dirty zones are 13GB in size. In my tests using the maximum size of al-extents gives a 10% performance benefit in disconnected mode while a size of 1024 gives a 4% performance boost. Changing the al-extents size seems to make no significant difference for a connected DRBD device.

All the tests on connected DRBD devices were done with 100baseT apart from the last one which was a separate Gigabit Ethernet cable connecting the two systems.

Conclusions

For the level of traffic that I’m using it seems that Gigabit Ethernet provides no performance benefit, the fact that it gave a slightly lower result is not relevant as the difference is within the margin of error.

Increasing the al-extents value helps with disconnected performance, a value of 1024 gives a 4% performance boost. I’m not sure that a value of 3389 is a good idea though.

The ext4 barriers are disabled by DRBD so a disconnected DRBD device gives performance that is closer to a barrier=0 mount than a regular ext4 mount. With the significant performance difference between connected and disconnected modes it seems possible that for some usage scenarios it could be useful to disable the DRBD secondary at times of peak load – it depends on whether DRBD is used as a really current backup or a strict mirror.

Future Tests

I plan to do some tests of DRBD over Linux software RAID-1 and tests to compare RAID-1 with and without bitmap support. I also plan to do some tests with the BTRFS filesystem, I know it’s not ready for production but it would still be nice to know what the performance is like.

But I won’t use the same systems, they don’t have enough CPU power. In my previous tests I established that a 1.5GHz P4 isn’t capable of driving the 20G IDE disk to it’s maximum capacity and I’m not sure that the 2.8GHz P4 is capable of running a RAID to it’s capacity. So I will use a dual-core 64bit system with a pair of SATA disks for future tests. The difference in performance between 20G IDE disks and 160G SATA disks should be a lot less than the performance difference between a 2.8GHz P4 and a dual-core 64bit CPU.

Related posts:

  1. DRBD Benchmarking I’ve got some performance problems with a mail server that’s...
  2. Some Notes on DRBD DRBD is a system for replicating a block device across...
  3. Ethernet bonding Bonding is one of the terms used to describe multiple...

Syndicated 2012-02-08 13:24:03 from etbe - Russell Coker

5 Principles of Backup Software

Everyone agrees that backups are generally a good thing. But it seems that there is a lot less agreement about how backups should work. Here is a list of 5 principles of backup software that seem to get ignored most of the time:

(1/5) Backups should not be Application Specific

It’s quite reasonable for people to want to extract data from a backup on a different platform. Maybe someone will want to extract data a few decades after the platform becomes obsolete. I believe that vendors of backup software have an ethical obligation to make it possible for customers to get their data out with minimal effort regardless of the circumstances.

Often when writing a backup application there will be good reasons for not using the existing formats for data storage (tar, cpio, zip, etc). But ideally any data store which involves something conceptually similar to a collection of files in one larger file will use one of those formats. There have been backward compatible extensions to tar and zip for SE Linux contexts and for OS/2 EAs – the possibility of extending archive file formats with no consequence other than warnings on extraction with an unpatched utility has been demonstrated.

For a backup which doesn’t involve source files (EG the contents of some sort of database) then it should be in a format that can be easily understood and parsed. Well designed XML is generally a reasonable option. Generally the format should involve plain text that is readable and easy to understand which is optionally compressed with a common compression utility (pkzip is a reasonable choice).

(2/5) Data Store Formats should be Published

For every data store there should be public documentation about it’s format to allow future developers to write support for it. It really isn’t difficult to release some commented header files so that people can easily determine the data structures. This includes all data stores including databases and filesystems. If I suddenly find myself with a 15yo image of a NTFS filesystem containing a proprietary database I should be able to find official header files for the version of NTFS and the database server in question so I can decode the data if it’s important enough.

When an application vendor hides the data formats it gives the risk of substantial data loss at some future time. Imposing such risk on customers to try and prevent them from migrating to a rival product is unethical.

(3/5) Backups should be forward and backward compatible

It is entirely unreasonable for a vendor to demand that all their users install the latest versions of their software. There are lots of good reasons for not upgrading which includes hardware not supporting new versions of the OS, lack of Internet access to perform the upgrade, application compatibility, and just liking the way the old version works. Even for the case of a critical security fix it should be possible to restore data without applying the fix.

For any pair of versions of software that are only separated by a few versions it should be possible to backup data from one and restore to the other. Even if the data can’t be used directly (EG a backup of AMD64 programs that is restored on an i386 system) it should still be accessible. If a new version of the software doesn’t support the ancient file formats then it should be possible for the users to get a slightly older version which talks to both the old and new versions.

Backups made on 64bit systems running the latest development version of Linux and on 10yo 32bit proprietary Unix systems are interchangeable. Admittedly Unix is really good at preserving file format compatibility, but there is no technical reason why other systems can’t do the same. Source code to cpio, tar, and gnuzip, is freely available!

Apple TimeMachine fails badly in this regard, even a slightly older version of Mac OS can’t do a restore. It is however nice that most of the TimeMachine data is a tree of files which could be just copied to another system.

(4/5) Backup Software should not be Dropped

Sony Ericsson has made me hate them even more by putting the following message on their update web site:

The Backup and Restore app will be overwritten and cannot be used to restore data. Check out Android Market for alternative apps to back up and restore your data, such as MyBackup.

So if you own a Sony Ericsson phone and it is lost, stolen, or completely destroyed and all you have is a backup made by the Sony Ericsson tool then the one thing you absolutely can’t do is to buy a new Sony Ericsson phone to restore the data.

I believe that anyone who releases backup software has an ethical obligation to support restoring to all equivalent systems. How difficult would it be to put a new free app in the Google Market that has as it’s sole purpose recovering old Sony Ericsson backups onto newer phones? It really can’t be that difficult, so even if they don’t want to waste critical ROM space by putting the feature in all new phones they can make it available to everyone who needs it. When compared to the cost of developing a new Android release for a series of phones the cost of writing such a restore program would be almost nothing.

It is simply mind-boggling that Sony Ericsson go against their own commercial interests in this regard. Surely it would make good business sense to be able to sell replacements for all the lost and broken Sony Ericsson phones, but instead customers who get burned by broken backups are given an incentive to buy a product from any other vendor.

(5/5) The greater the control over data the greater the obligation for protecting it

If you have data stored in a simple and standard manner (EG the /DCIM directory containing MP4 and JPEG files that is on the USB accessible storage in every modern phone) then IMHO it’s quite OK to leave customers to their own devices in terms of backups. Typical users can work out that if they don’t backup their pictures then they risk losing them, and they can work out how to do it.

My Sony Ericsson phones have data stored under /data (settings for Android applications) which is apparently only accessible as root. Sony Ericsson have denied me root access which prevents me running backup programs such as Titanium Backup, therefore I believe that they have a great obligation to provide a way of making a backup of this data and restoring it on a new phone or a phone that has been updated. To just provide phone upgrade instructions which tell me that my phone will be entirely wiped and that I should search the App Market for backup programs is unacceptable.

I believe that there are two ethical options available to Sony Ericsson at this time, one is to make it easy to root phones so that Titanium Backup and similar programs can be used, and the other option is to release a suitable backup program for older phones. Based on experience I don’t expect Sony Ericsson to choose either option.

Now it is also a bad thing for the Android application developers to make it difficult or impossible to backup their data. For example the Wiki for one Android game gives instructions for moving the saved game files to a new phone which starts with “root your phone”. The developers of that game should have read the Wiki, realised that rooting a phone for the mundane task of transferring saved game files is totally unreasonable, and developed a better alternative.

The best thing for developers to do is to allow the users to access their own data in the most convenient manner. Then it becomes the user’s responsibility to manage it and they can concentrate on improving their application.

Why Freedom is Important

Installing CyanogenMod on my Galaxy S was painful, but having root access so I can do anything I want is a great benefit. If phone vendors would do the right thing then I could recommend that other people use the vendor release, but it seems that vendors can be expected to act unethically. So I can’t recommend that anyone use an un-modded Android phone at any time. I also can’t recommend ever buying a Sony Ericsson product, not even when it’s really cheap.

Google have done a great thing with their Data Liberation Front [1]. Not only are they providing access to the data they store on our behalf (which is a good thing) but they have a mission statement that demands the same behavior from other companies – they make it an issue of competitive advantage! So while Sony Ericsson and other companies might not see a benefit in making people like me stop hating them, failing to be as effective in marketing as Google is a real issue. Data Liberation is something that should be discussed at board elections of IT companies.

Keep in mind the fact that ethics are not just about doing nice things, they are about establishing expectations of conduct that will be used by people who deal with you in future. Sony Ericsson has shown that I should expect that they will treat the integrity of my data with contempt and I will keep this in mind every time I decline an opportunity to purchase their products. Google has shown that they consider the protection of my data as an important issue and therefore I can be confident when using and recommending their services that I won’t get stuck with data that is locked away.

While Google has demonstrated that corporations can do the right thing, the vast majority of evidence suggests that we should never trust a corporation with anything that we might want to retrieve when it’s not immediately profitable for the corporation. Therefore avoiding commercial services for storing important data is the sensible thing to do.

Related posts:

  1. Galaxy S vs Xperia X10 and Android Network Access Galaxy S Review I’ve just been given an indefinite loan...
  2. Standardising Android Don Marti wrote an amusing post about the lack of...
  3. On Burning Platforms Nokia is in the news for it’s CEO announcing that...

Syndicated 2012-02-07 07:26:04 from etbe - Russell Coker

Reliability of RAID

ZDNet has an insightful article by Robin Harris predicting the demise of RAID-6 due to the probability of read errors [1]. Basically as drives get larger the probability of hitting a read error during reconstruction increases and therefore you need to have more redundancy to deal with this. He suggests that as of 2009 drives were too big for a reasonable person to rely on correct reads from all remaining drives after one drive failed (in the case of RAID-5) and that in 2019 there will be a similar issue with RAID-6.

Of course most systems in the field aren’t using even RAID-6. All the most economical hosting options involve just RAID-1 and RAID-5 is still fairly popular with small servers. With RAID-1 and RAID-5 you have a serious problem when (not if) a disk returns random or outdated data and says that it is correct, you have no way of knowing which of the disks in the set has good data and which has bad data. For RAID-5 it will be theoretically possible to reconstruct the data in some situations by determining which disk should have it’s data discarded to give a result that passes higher level checks (EG fsck or application data consistency), but this is probably only viable in extreme cases (EG one disk returns only corrupt data for all reads).

For the common case of a RAID-1 array if one disk returns a few bad sectors then probably most people will just hope that it doesn’t hit something important. The case of Linux software RAID-1 is of interest to me because that is used by many of my servers.

Robin has also written about some NetApp research into the incidence of read errors which indicates that 8.5% of “consumer” disks had such errors during the 32 month study period [2]. This is a concern as I run enough RAID-1 systems with “consumer” disks that it is very improbable that I’m not getting such errors. So the question is, how can I discover such errors and fix them?

In Debian the mdadm package does a monthly scan of all software RAID devices to try and find such inconsistencies, but it doesn’t send an email to alert the sysadmin! I have filed Debian bug #658701 with a patch to make mdadm send email about this. But this really isn’t going to help a lot as the email will be sent AFTER the kernel has synchronised the data with a 50% chance of overwriting the last copy of good data with the bad data! Also the kernel code doesn’t seem to tell userspace which disk had the wrong data in a 3-disk mirror (and presumably a RAID-6 works in the same way) so even if the data can be corrected I won’t know which disk is failing.

Another problem with RAID checking is the fact that it will inherently take a long time and in practice can take a lot longer than necessary. For example I run some systems with LVM on RAID-1 on which only a fraction of the VG capacity is used, in one case the kernel will check 2.7TB of RAID even when there’s only 470G in use!

The BTRFS Filesystem

The btrfs Wiki is currently at btrfs.ipv5.de as the kernel.org wikis are apparently still read-only since the compromise [3]. BTRFS is noteworthy for doing checksums on data and metadata and for having internal support for RAID. So if two disks in a BTRFS RAID-1 disagree then the one with valid checksums will be taken as correct!

I’ve just done a quick test of this. I created a filesystem with the command “mkfs.btrfs -m raid1 -d raid1 /dev/vg0/raid?” and copied /dev/urandom to it until it was full. I then used dd to copy /dev/urandom to some parts of /dev/vg0/raidb while reading files from the mounted filesystem – that worked correctly although I was disappointed that it didn’t report any errors, I had hoped that it would read half the data from each device and fix some errors on the fly. Then I ran the command “btrfs scrub start .” and it gave lots of verbose errors in the kernel message log telling me which device had errors and where the errors are. I was a little disappointed that the command “btrfs scrub status .” just gave me a count of the corrected errors and didn’t mention which device had the errors.

It seems to me that BTRFS is going to be a much better option than Linux software RAID once it is stable enough to use in production. I am considering upgrading one of my less important servers to Debian/Unstable to test out BTRFS in this configuration.

BTRFS is rumored to have performance problems, I will test this but don’t have time to do so right now. Anyway I’m not always particularly concerned about performance, I have some systems where reliability is important enough to justify a performance loss.

BTRFS and Xen

The system with the 2.7TB RAID-1 is a Xen server and LVM volumes on that RAID are used for the block devices of the Xen DomUs. It seems obvious that I could create a single BTRFS filesystem for such a machine that uses both disks in a RAID-1 configuration and then use files on the BTRFS filesystem for Xen block devices. But that would give a lot of overhead of having a filesystem within a filesystem. So I am considering using two LVM volume groups, one for each disk. Then for each DomU which does anything disk intensive I can export two LVs, one from each physical disk and then run BTRFS inside the DomU. The down-side of this is that each DomU will need to scrub the devices and monitor the kernel log for checksum errors. Among other things I will have to back-port the BTRFS tools to CentOS 4.

This will be more difficult to manage than just having an LVM VG running on a RAID-1 array and giving each DomU a couple of LVs for storage.

BTRFS and DRBD

The combination of BTRFS RAID-1 and DRBD is going to be a difficult one. The obvious way of doing it would be to run DRBD over loopback devices that use large files on a BTRFS filesystem. That gives the overhead of a filesystem in a filesystem as well as the DRBD overhead.

It would be nice if BTRFS supported more than two copies of mirrored data. Then instead of DRBD over RAID-1 I could have two servers that each have two devices exported via NBD and BTRFS could store the data on all four devices. With that configuration I could lose an entire server and get a read error without losing any data!

Comparing Risks

I don’t want to use BTRFS in production now because of the risk of bugs. While it’s unlikely to have really serious bugs it’s theoretically possible that as bug could deny access to data until kernel code is fixed and it’s also possible (although less likely) that a bug could result in data being overwritten such that it can never be recovered. But for the current configuration (Ext4 on Linux software RAID-1) it’s almost certain that I will lose small amounts of data and it’s most probable that I have silently lost data on many occasions without realising.

Related posts:

  1. Some RAID Issues I just read an interesting paper titled An Analysis of...
  2. ECC RAM is more useful than RAID A common myth in the computer industry seems to be...
  3. Software vs Hardware RAID Should you use software or hardware RAID? Many people claim...

Syndicated 2012-02-05 14:46:36 from etbe - Russell Coker

944 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!