Older blog entries for etbe (starting at number 946)

More DRBD Performance tests

I’ve previously written Some Notes on DRBD [1] and a post about DRBD Benchmarking [2].

Previously I had determined that replication protocol C gives the best performance for DRBD, that the batch-time parameters for Ext4 aren’t worth touching for a single IDE disk, that barrier=0 gives a massive performance boost, and that DRBD gives a significant performance hit even when the secondary is not connected. Below are the results of some more tests of delivering mail from my Postal benchmark to my LMTP server which uses the Dovecot delivery agent to write it to disk, the rates are in messages per minute where each message is an average of 70K in size. The ext4 filesystem is used for all tests and the filesystem features list is “has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize“.

p4-2.8
Default Ext4 1663
barrier=0 2875
DRBD no secondary al-extents=7 645
DRBD no secondary default 2409
DRBD no secondary al-extents=1024 2513
DRBD no secondary al-extents=3389 2650
DRBD connected 1575
DRBD connected al-extents=1024 1560
DRBD connected al-extents=1024 Gig-E 1544

The al-extents option determines the size of the dirty areas that need to be resynced when a failed node rejoins the cluster. The default is 127 extents of 4M each for a block size of 508MB to be synchronised. The maximum is 3389 for a synchronisation block size of just over 13G. Even with fast disks and gigabit Ethernet it’s going to take a while to synchronise things if dirty zones are 13GB in size. In my tests using the maximum size of al-extents gives a 10% performance benefit in disconnected mode while a size of 1024 gives a 4% performance boost. Changing the al-extents size seems to make no significant difference for a connected DRBD device.

All the tests on connected DRBD devices were done with 100baseT apart from the last one which was a separate Gigabit Ethernet cable connecting the two systems.

Conclusions

For the level of traffic that I’m using it seems that Gigabit Ethernet provides no performance benefit, the fact that it gave a slightly lower result is not relevant as the difference is within the margin of error.

Increasing the al-extents value helps with disconnected performance, a value of 1024 gives a 4% performance boost. I’m not sure that a value of 3389 is a good idea though.

The ext4 barriers are disabled by DRBD so a disconnected DRBD device gives performance that is closer to a barrier=0 mount than a regular ext4 mount. With the significant performance difference between connected and disconnected modes it seems possible that for some usage scenarios it could be useful to disable the DRBD secondary at times of peak load – it depends on whether DRBD is used as a really current backup or a strict mirror.

Future Tests

I plan to do some tests of DRBD over Linux software RAID-1 and tests to compare RAID-1 with and without bitmap support. I also plan to do some tests with the BTRFS filesystem, I know it’s not ready for production but it would still be nice to know what the performance is like.

But I won’t use the same systems, they don’t have enough CPU power. In my previous tests I established that a 1.5GHz P4 isn’t capable of driving the 20G IDE disk to it’s maximum capacity and I’m not sure that the 2.8GHz P4 is capable of running a RAID to it’s capacity. So I will use a dual-core 64bit system with a pair of SATA disks for future tests. The difference in performance between 20G IDE disks and 160G SATA disks should be a lot less than the performance difference between a 2.8GHz P4 and a dual-core 64bit CPU.

Related posts:

  1. DRBD Benchmarking I’ve got some performance problems with a mail server that’s...
  2. Some Notes on DRBD DRBD is a system for replicating a block device across...
  3. Ethernet bonding Bonding is one of the terms used to describe multiple...

Syndicated 2012-02-08 13:24:03 from etbe - Russell Coker

5 Principles of Backup Software

Everyone agrees that backups are generally a good thing. But it seems that there is a lot less agreement about how backups should work. Here is a list of 5 principles of backup software that seem to get ignored most of the time:

(1/5) Backups should not be Application Specific

It’s quite reasonable for people to want to extract data from a backup on a different platform. Maybe someone will want to extract data a few decades after the platform becomes obsolete. I believe that vendors of backup software have an ethical obligation to make it possible for customers to get their data out with minimal effort regardless of the circumstances.

Often when writing a backup application there will be good reasons for not using the existing formats for data storage (tar, cpio, zip, etc). But ideally any data store which involves something conceptually similar to a collection of files in one larger file will use one of those formats. There have been backward compatible extensions to tar and zip for SE Linux contexts and for OS/2 EAs – the possibility of extending archive file formats with no consequence other than warnings on extraction with an unpatched utility has been demonstrated.

For a backup which doesn’t involve source files (EG the contents of some sort of database) then it should be in a format that can be easily understood and parsed. Well designed XML is generally a reasonable option. Generally the format should involve plain text that is readable and easy to understand which is optionally compressed with a common compression utility (pkzip is a reasonable choice).

(2/5) Data Store Formats should be Published

For every data store there should be public documentation about it’s format to allow future developers to write support for it. It really isn’t difficult to release some commented header files so that people can easily determine the data structures. This includes all data stores including databases and filesystems. If I suddenly find myself with a 15yo image of a NTFS filesystem containing a proprietary database I should be able to find official header files for the version of NTFS and the database server in question so I can decode the data if it’s important enough.

When an application vendor hides the data formats it gives the risk of substantial data loss at some future time. Imposing such risk on customers to try and prevent them from migrating to a rival product is unethical.

(3/5) Backups should be forward and backward compatible

It is entirely unreasonable for a vendor to demand that all their users install the latest versions of their software. There are lots of good reasons for not upgrading which includes hardware not supporting new versions of the OS, lack of Internet access to perform the upgrade, application compatibility, and just liking the way the old version works. Even for the case of a critical security fix it should be possible to restore data without applying the fix.

For any pair of versions of software that are only separated by a few versions it should be possible to backup data from one and restore to the other. Even if the data can’t be used directly (EG a backup of AMD64 programs that is restored on an i386 system) it should still be accessible. If a new version of the software doesn’t support the ancient file formats then it should be possible for the users to get a slightly older version which talks to both the old and new versions.

Backups made on 64bit systems running the latest development version of Linux and on 10yo 32bit proprietary Unix systems are interchangeable. Admittedly Unix is really good at preserving file format compatibility, but there is no technical reason why other systems can’t do the same. Source code to cpio, tar, and gnuzip, is freely available!

Apple TimeMachine fails badly in this regard, even a slightly older version of Mac OS can’t do a restore. It is however nice that most of the TimeMachine data is a tree of files which could be just copied to another system.

(4/5) Backup Software should not be Dropped

Sony Ericsson has made me hate them even more by putting the following message on their update web site:

The Backup and Restore app will be overwritten and cannot be used to restore data. Check out Android Market for alternative apps to back up and restore your data, such as MyBackup.

So if you own a Sony Ericsson phone and it is lost, stolen, or completely destroyed and all you have is a backup made by the Sony Ericsson tool then the one thing you absolutely can’t do is to buy a new Sony Ericsson phone to restore the data.

I believe that anyone who releases backup software has an ethical obligation to support restoring to all equivalent systems. How difficult would it be to put a new free app in the Google Market that has as it’s sole purpose recovering old Sony Ericsson backups onto newer phones? It really can’t be that difficult, so even if they don’t want to waste critical ROM space by putting the feature in all new phones they can make it available to everyone who needs it. When compared to the cost of developing a new Android release for a series of phones the cost of writing such a restore program would be almost nothing.

It is simply mind-boggling that Sony Ericsson go against their own commercial interests in this regard. Surely it would make good business sense to be able to sell replacements for all the lost and broken Sony Ericsson phones, but instead customers who get burned by broken backups are given an incentive to buy a product from any other vendor.

(5/5) The greater the control over data the greater the obligation for protecting it

If you have data stored in a simple and standard manner (EG the /DCIM directory containing MP4 and JPEG files that is on the USB accessible storage in every modern phone) then IMHO it’s quite OK to leave customers to their own devices in terms of backups. Typical users can work out that if they don’t backup their pictures then they risk losing them, and they can work out how to do it.

My Sony Ericsson phones have data stored under /data (settings for Android applications) which is apparently only accessible as root. Sony Ericsson have denied me root access which prevents me running backup programs such as Titanium Backup, therefore I believe that they have a great obligation to provide a way of making a backup of this data and restoring it on a new phone or a phone that has been updated. To just provide phone upgrade instructions which tell me that my phone will be entirely wiped and that I should search the App Market for backup programs is unacceptable.

I believe that there are two ethical options available to Sony Ericsson at this time, one is to make it easy to root phones so that Titanium Backup and similar programs can be used, and the other option is to release a suitable backup program for older phones. Based on experience I don’t expect Sony Ericsson to choose either option.

Now it is also a bad thing for the Android application developers to make it difficult or impossible to backup their data. For example the Wiki for one Android game gives instructions for moving the saved game files to a new phone which starts with “root your phone”. The developers of that game should have read the Wiki, realised that rooting a phone for the mundane task of transferring saved game files is totally unreasonable, and developed a better alternative.

The best thing for developers to do is to allow the users to access their own data in the most convenient manner. Then it becomes the user’s responsibility to manage it and they can concentrate on improving their application.

Why Freedom is Important

Installing CyanogenMod on my Galaxy S was painful, but having root access so I can do anything I want is a great benefit. If phone vendors would do the right thing then I could recommend that other people use the vendor release, but it seems that vendors can be expected to act unethically. So I can’t recommend that anyone use an un-modded Android phone at any time. I also can’t recommend ever buying a Sony Ericsson product, not even when it’s really cheap.

Google have done a great thing with their Data Liberation Front [1]. Not only are they providing access to the data they store on our behalf (which is a good thing) but they have a mission statement that demands the same behavior from other companies – they make it an issue of competitive advantage! So while Sony Ericsson and other companies might not see a benefit in making people like me stop hating them, failing to be as effective in marketing as Google is a real issue. Data Liberation is something that should be discussed at board elections of IT companies.

Keep in mind the fact that ethics are not just about doing nice things, they are about establishing expectations of conduct that will be used by people who deal with you in future. Sony Ericsson has shown that I should expect that they will treat the integrity of my data with contempt and I will keep this in mind every time I decline an opportunity to purchase their products. Google has shown that they consider the protection of my data as an important issue and therefore I can be confident when using and recommending their services that I won’t get stuck with data that is locked away.

While Google has demonstrated that corporations can do the right thing, the vast majority of evidence suggests that we should never trust a corporation with anything that we might want to retrieve when it’s not immediately profitable for the corporation. Therefore avoiding commercial services for storing important data is the sensible thing to do.

Related posts:

  1. Galaxy S vs Xperia X10 and Android Network Access Galaxy S Review I’ve just been given an indefinite loan...
  2. Standardising Android Don Marti wrote an amusing post about the lack of...
  3. On Burning Platforms Nokia is in the news for it’s CEO announcing that...

Syndicated 2012-02-07 07:26:04 from etbe - Russell Coker

Reliability of RAID

ZDNet has an insightful article by Robin Harris predicting the demise of RAID-6 due to the probability of read errors [1]. Basically as drives get larger the probability of hitting a read error during reconstruction increases and therefore you need to have more redundancy to deal with this. He suggests that as of 2009 drives were too big for a reasonable person to rely on correct reads from all remaining drives after one drive failed (in the case of RAID-5) and that in 2019 there will be a similar issue with RAID-6.

Of course most systems in the field aren’t using even RAID-6. All the most economical hosting options involve just RAID-1 and RAID-5 is still fairly popular with small servers. With RAID-1 and RAID-5 you have a serious problem when (not if) a disk returns random or outdated data and says that it is correct, you have no way of knowing which of the disks in the set has good data and which has bad data. For RAID-5 it will be theoretically possible to reconstruct the data in some situations by determining which disk should have it’s data discarded to give a result that passes higher level checks (EG fsck or application data consistency), but this is probably only viable in extreme cases (EG one disk returns only corrupt data for all reads).

For the common case of a RAID-1 array if one disk returns a few bad sectors then probably most people will just hope that it doesn’t hit something important. The case of Linux software RAID-1 is of interest to me because that is used by many of my servers.

Robin has also written about some NetApp research into the incidence of read errors which indicates that 8.5% of “consumer” disks had such errors during the 32 month study period [2]. This is a concern as I run enough RAID-1 systems with “consumer” disks that it is very improbable that I’m not getting such errors. So the question is, how can I discover such errors and fix them?

In Debian the mdadm package does a monthly scan of all software RAID devices to try and find such inconsistencies, but it doesn’t send an email to alert the sysadmin! I have filed Debian bug #658701 with a patch to make mdadm send email about this. But this really isn’t going to help a lot as the email will be sent AFTER the kernel has synchronised the data with a 50% chance of overwriting the last copy of good data with the bad data! Also the kernel code doesn’t seem to tell userspace which disk had the wrong data in a 3-disk mirror (and presumably a RAID-6 works in the same way) so even if the data can be corrected I won’t know which disk is failing.

Another problem with RAID checking is the fact that it will inherently take a long time and in practice can take a lot longer than necessary. For example I run some systems with LVM on RAID-1 on which only a fraction of the VG capacity is used, in one case the kernel will check 2.7TB of RAID even when there’s only 470G in use!

The BTRFS Filesystem

The btrfs Wiki is currently at btrfs.ipv5.de as the kernel.org wikis are apparently still read-only since the compromise [3]. BTRFS is noteworthy for doing checksums on data and metadata and for having internal support for RAID. So if two disks in a BTRFS RAID-1 disagree then the one with valid checksums will be taken as correct!

I’ve just done a quick test of this. I created a filesystem with the command “mkfs.btrfs -m raid1 -d raid1 /dev/vg0/raid?” and copied /dev/urandom to it until it was full. I then used dd to copy /dev/urandom to some parts of /dev/vg0/raidb while reading files from the mounted filesystem – that worked correctly although I was disappointed that it didn’t report any errors, I had hoped that it would read half the data from each device and fix some errors on the fly. Then I ran the command “btrfs scrub start .” and it gave lots of verbose errors in the kernel message log telling me which device had errors and where the errors are. I was a little disappointed that the command “btrfs scrub status .” just gave me a count of the corrected errors and didn’t mention which device had the errors.

It seems to me that BTRFS is going to be a much better option than Linux software RAID once it is stable enough to use in production. I am considering upgrading one of my less important servers to Debian/Unstable to test out BTRFS in this configuration.

BTRFS is rumored to have performance problems, I will test this but don’t have time to do so right now. Anyway I’m not always particularly concerned about performance, I have some systems where reliability is important enough to justify a performance loss.

BTRFS and Xen

The system with the 2.7TB RAID-1 is a Xen server and LVM volumes on that RAID are used for the block devices of the Xen DomUs. It seems obvious that I could create a single BTRFS filesystem for such a machine that uses both disks in a RAID-1 configuration and then use files on the BTRFS filesystem for Xen block devices. But that would give a lot of overhead of having a filesystem within a filesystem. So I am considering using two LVM volume groups, one for each disk. Then for each DomU which does anything disk intensive I can export two LVs, one from each physical disk and then run BTRFS inside the DomU. The down-side of this is that each DomU will need to scrub the devices and monitor the kernel log for checksum errors. Among other things I will have to back-port the BTRFS tools to CentOS 4.

This will be more difficult to manage than just having an LVM VG running on a RAID-1 array and giving each DomU a couple of LVs for storage.

BTRFS and DRBD

The combination of BTRFS RAID-1 and DRBD is going to be a difficult one. The obvious way of doing it would be to run DRBD over loopback devices that use large files on a BTRFS filesystem. That gives the overhead of a filesystem in a filesystem as well as the DRBD overhead.

It would be nice if BTRFS supported more than two copies of mirrored data. Then instead of DRBD over RAID-1 I could have two servers that each have two devices exported via NBD and BTRFS could store the data on all four devices. With that configuration I could lose an entire server and get a read error without losing any data!

Comparing Risks

I don’t want to use BTRFS in production now because of the risk of bugs. While it’s unlikely to have really serious bugs it’s theoretically possible that as bug could deny access to data until kernel code is fixed and it’s also possible (although less likely) that a bug could result in data being overwritten such that it can never be recovered. But for the current configuration (Ext4 on Linux software RAID-1) it’s almost certain that I will lose small amounts of data and it’s most probable that I have silently lost data on many occasions without realising.

Related posts:

  1. Some RAID Issues I just read an interesting paper titled An Analysis of...
  2. ECC RAM is more useful than RAID A common myth in the computer industry seems to be...
  3. Software vs Hardware RAID Should you use software or hardware RAID? Many people claim...

Syndicated 2012-02-05 14:46:36 from etbe - Russell Coker

A Computer Conference on a Cruise Ship

After LCA [1] there was a discussion about possible locations for future conferences, most of the messages in the discussion were jokes or suggestions that don’t seriously apply to LCA. So I’ll add my suggestion for conferences other than LCA.

I’ve previously written generally about the issue of conferences at sea [2]. I don’t think that LCA would be suitable for running at sea because delegates have specific expectations for LCA which are quite different to what a cruise ship can offer, so I don’t think it makes sense to change LCA which is working well as it is. However there are lots of other possible computer conferences which could suite a cruise ship.

Price

Price is a major factor in running a conference, so obviously getting a cheap cruise price is very important. Here is a link for Vacations To Go which shows cruises from the Australia/NZ region which are of at least 5 nights and cost no more than $800 [3]. The cheapest entry at this moment is $609 for 5 nights and the cheapest on a per-night basis is an 8 night cruise for $779. The cheapest cruise currently on offer which allows a conference similar to LCA is 7 nights for $699. The prices should be regarded as rough approximations as some cruises have some mandatory extra fees and the prices are quoted in US dollars and subject to currency fluctuations. Note that those prices are for dual-occupancy cabins, this can be a “double” or a “twin” configuration. Some cruise ships have cabins for 3 or 4 people that are cheaper, but if you have a cabin for a single person then the rate is almost the same as for having two people.

The price for LCA accommodation including breakfast was $78 per night for a single room or $92 for a double room. Then lunch cost a minimum of $10 and for dinner there was $80 for the penguin dinner and probably about $20 for dinner every other night. That gave an overall cost for a 6 night stay (which is probably the minimum for someone who lives further away than Melbourne) in Ballarat of 6*78+6*10+5*20+80==$708. For a double room that would be 6*92+6*10+5*20+2*80==$872.

Even if we don’t count the fact that the Australian dollar is worth more than the US dollar it is obvious that on the basis of accommodation and food two people sharing a twin cabin on a cruise ship could pay LESS than two people in single rooms at the Ballarat University dorms! Now sharing a cabin isn’t so great, but the upside is that cruise ships have excellent food and lots of other entertainment options. I previously reviewed the food on the Dawn Princess and determined that it’s better than the food I would expect to get if I spent the cost of the cruise on dinner at land based restaurants [4].

I have been led to believe that the use of ship conference facilities is typically free for any organisation that books a sufficient number of cabins. So there’s no reason why the conference admission fees should be any greater than for a land based conference.

Advantages

A common problem with conferences is finding suitable dining options. Most people want to eat with other delegates but finding restaurants that have sufficient space and which are conveniently located is difficult at best and often impossible. On a cruise ship everything is within a short walk and the restaurants are big, usually be at least one restaurant will hold 500 people. The fact that you have to reserve times for the “Main Dining Room” makes it more difficult to miss one’s colleagues.

Everything on a cruise ship is luxurious.

There are lots of good locations for BoFs, pools, cafes, restaurants, and bars. Basically the ship is filled with comfortable places for groups of people to sit down.

A cruise ship typically has a main theater with more than 700 seats – more than large enough for most conferences I’ve attended. It’s common for the size of a conference to be limited to the size of the main theater that is used, for a cruise ship this will probably be less of a problem than for most other conference venues.

Disadvantages

The first disadvantage of running a computer conference on a cruise ship is the almost total lack of net access. The costs for net access are more expensive than most delegates will pay. Probably many delegates would check their email but it wouldn’t be practical for people to download source code, browse Wikipedia, and use the Internet in other ways related to the conference. It would be practical to have mirrors of Wikipedia, the source of several distributions of Linux, and other big things of common interest.

Another possible problem is the fact that you need to book it well in advance to avoid the risk of selling out (there is no option to stay at a different hotel). An established conference with financial backing could just pay to reserve the cabins. But when starting a new conference this could be a problem.

Alcohol is rather expensive on cruise ships. But getting really drunk isn’t compatible with learning about computer science anyway.

Finally the requirement to have at least two people in a cabin for good rates is a serious issue. The upside of this is that people travelling with their SO would find that it works really well (regardless of whether the SO is a delegate or not). But anyone who’s not travelling with their SO and doesn’t want to share with a friend will have to either pay a lot more or skip the conference.

Conclusion

I think that there is a good potential for running a computer conference around the Australia/NZ region on a cruise ship. It won’t be overly expensive for delegates and the facilities that are provided are good. The trade-off for solitary travelers of having to share a cabin (or pay more) for getting much better food and leisure facilities will be appreciated by many people (and admittedly hated by some).

Some people won’t appreciate the option of swimming, but even if you consider the cruise ship to be just a floating collection of restaurants and cabins it’s still fairly luxurious and beats the heck out of most conferences I’ve attended.

If you are considering the possibility of running a conference then I think that a cruise ship should be considered. VacationsToGo.com is the best site I’ve found for cheap cruise prices, their large group department has experience handling groups of more than 500 people so I think that anyone who wants to run a new conference in/around Australia should give them a call.

Also cruise ships travel around the world, so the same thing can be done in other countries but at a different time of year. The economic factors will differ by country though. Cruise ships probably aren’t a cheap option for a conference in some other countries.

Related posts:

  1. My First Cruise A few weeks ago I went on my first cruise,...
  2. Cruises It seems that in theory cruises can make for quite...
  3. Creating a Micro Conference The TEDxVolcano The TED conference franchise has been extended to...

Syndicated 2012-02-03 11:17:16 from etbe - Russell Coker

Links January 2012

Cops in Tennessee routinely steal cash from citizens [1]. They are ordered to do so and in some cases their salary is paid from the cash that they take. So they have a good reason to imagine that any large sum of money is drug money and take it.

David Frum wrote an insightful article for NY Mag about the problems with the US Republican Party [2].

TreeHugger.com has an interesting article about eco-friendly features on some modern cruise ships [3].

Dan Walsh describes how to get the RSA SecureID PAM module working on a SE Linux system [4]. It’s interesting that RSA was telling everyone to turn off SE Linux and shipping a program that was falsely marked as needing an executable stack and which uses netstat instead of /dev/urandom for entropy. Really the only way RSA could do worse could be to fall victim to an Advanced Persistent Attack… :-#

The Long Now has an interesting summary of a presentation about archive.org [5]. I never realised the range of things that archive.org stores, I will have to explore that if I find some spare time!

Jonah Lehrer wrote a detailed and informative article about the way that American high school students receive head injuries playing football[6]. He suggests that it might eventually be the end of the game as we know it.

François Marier wrote an informative article about optimising PNG files [7], optipng is apparently the best option at the moment but it doesn’t do everything you might want.

Helen Keeble wrote an interesting review of Twilight [8]. The most noteworthy thing about it IMHO is that she tries to understand teenage girls who like the books and movies. Trying to understand young people is quite rare.

Jon Masters wrote a critique of the concept of citizen journalism and described how he has two subscriptions to the NYT as a way of donating to support quality journalism [9]. The only comment on his post indicates a desire for biased news (such as Fox) which shows the reason why most US media is failing at journalism.

Luis von Ahn gave an interesting TED talk about crowd-sourced translation [10]. He starts by describing CAPTCHAs and the way that his company ReCAPTCHA provides the CAPTCHA service while also using people’s time to digitise books. Then he describes his online translation service and language education system DuoLingo which allows people to learn a second language for free while translating text between languages [11]. One of the benefits of this is that people don’t have to pay to learn a new language and thus poor people can learn other languages – great for people in developing countries that want to learn first-world languages! DuoLingo is in a beta phase at the moment but they are taking some volunteers.

Cory Doctorow wrote an insightful article for the Publishers Weekly titles “Copyrights vs Human Rights” [12] which is primarily about SOPA.

Naomi Wolf wrote an insightful article for The Guardian about the “Occupy” movement, among other things the highest levels of the US government are using the DHS as part of the crackdown [13]. Naomi’s claim is that the right-wing and government attacks on the Occupy movement are due to the fact that they want to reform the political process and prevent corruption.

John Bohannon gave an interesting and entertaining TED talk about using dance as part of a presentation [14]. He gave an example of using dancerts to illustrate some concepts related to physics and then spoke about the waste of PowerPoint.

Joe Sabia gave an amusing and inspiring TED talk about the technology of storytelling [15]. He gave the presentation with live actions on his iPad to match his words, a difficult task to perform successfully.

Thomas Koch wrote an informative post about some of the issues related to binary distribution of software [16]. I think the problem is evenm worse than Thomas describes.

Related posts:

  1. Links January 2011 Halla Tomasdottir gave an interesting TED talk about her financial...
  2. Links January 2010 Magnus Larsson gave an interesting TED talk about using bacteria...
  3. Links January 2009 Jennifer 8 Lee gave an interesting TED talk about the...

Syndicated 2012-01-26 01:49:56 from etbe - Russell Coker

SE Linux Status in Debian 2012-01

Since my last SE Linux in Debian status report [1] there have been some significant changes.

Policy

Last year I reported that the policy wasn’t very usable, on the 18th of January I uploaded version 2:2.20110726-2 of the policy packages that fixes many bugs. The policy should now be usable by most people for desktop operations and as a server. Part of the delay was that I wanted to include support for systemd, but as my work on systemd proceeded slowly and others didn’t contribute policy I could use I gave up and just released it. Systemd is still a priority for me and I plan to use it on all my systems when Wheezy is released.

Kernel

Some time between Debian kernel 3.0.0-2 and 3.1.0-1 support for an upstream change to the security module configuration was incorporated. Instead of using selinux=1 on the kernel command line to enable SE Linux support the kernel option is security=selinux. This change allows people to boot with security=tomoyo or security=apparmor if they wish. No support for Smack though.

As the kernel silently ignores command line parameters that it doesn’t understand so there is no harm in having both selinux=1 and security=selinux on both older and newer kernels. So version 0.5.0 of selinux-basics now adds both kernel command-line options to GRUB configuration when selinux-activate is run. Also when the package is upgraded it will search for selinux=1 in the GRUB configuration and if it’s there it will add security=selinux. This will give users the functionality that they expect, systems which have SE Linux activated will keep running SE Linux after a kernel upgrade or downgrade! Prior to updating selinux-basics systems running Debian/Unstable won’t work with SE Linux.

As an aside the postinst file for selinux-basics was last changed in 2006 (thanks Erich Schubert). This package is part of the new design of SE Linux in Debian and some bits of it haven’t needed to be changed for 6 years! SE Linux isn’t a new thing, it’s been in production for a long time.

Audit

While the audit daemon isn’t strictly a part of SE Linux (each can be used without the other) it seems that most of the time they are used together (in Debian at least). I have prepared a NMU of the new upstream version of audit and uploaded it to delayed/7. I want to get everything related to SE Linux up to date or at least with comparable versions to Fedora. Also I sent some of the Debian patches for the auditd upstream which should reduce the maintenance effort in future.

Libraries

There have been some NMUs of libraries that are part of SE Linux. Due to a combination of having confidence in the people doing the NMUs and not having much spare time I have let them go through without review. I’m sure that I will notice soon enough if they don’t work, my test systems exercise enough SE Linux functionality that it would be difficult to break things without me noticing.

Play Machine

I am now preparing a new SE Linux “Play Machine” running Debian/Unstable. I wore my Play Machine shirt at LCA so I’ve got to get one going again soon. This is a good exercise of the strict features of SE Linux policy, I’ve found some bugs which need to be fixed. Running Play Machines really helps improve the overall quality of SE Linux.

Related posts:

  1. Status of SE Linux in Debian LCA 2009 This morning I gave a talk at the Security mini-conf...
  2. SE Linux in Debian I have now got a Debian Xen domU running the...
  3. Debian SE Linux Status At the moment I’ve got more time to work on...

Syndicated 2012-01-25 11:36:31 from etbe - Russell Coker

My First Cruise

A few weeks ago I went on my first cruise, from Sydney to Melbourne on the Dawn Princess. VacationsToGo.com (a discount cruise/resort web site) has a review of the Dawn Princess [1], they give it 4 stars out of a possible 6. The 6 star ships seem to have discount rates in excess of $500 per day per person, much more than I would pay.

The per-person rate is based on two people sharing a cabin, it seems that most cabins can be configured as a double bed or twin singles. If there is only one person in a cabin then they pay almost double the normal rate. It seems that most cruise ships have some support for cabins with more than two people (at a discount rate), but the cabins which support that apparently sell out early and don’t seem to be available when booking a cheap last-minute deal over the Internet. So if you want a cheap cruise then you need to have an even number of people in your party.

The cruise I took was two nights and cost $238 per person, it was advertised at something like $220 but then there are extra fees when you book (which seems to be the standard practice).

The Value of Cruises

To book a hotel room that is reasonably comfortable (4 star) in Melbourne or Sydney you need to spend more than $100 per night for a two person room if using Wotif.com. The list price of a 4 star hotel room for two people in a central city area can be well over $300 per night. So the cost for a cruise is in the range of city hotel prices.

The Main Dining Room (MDR) has a quality of food and service that compares well with city restaurants. The food and service in the Dawn Princess MDR wasn’t quite as good as Walter’s Wine Bar (one of my favorite restaurants). But Walter’s costs about $90 for a four course meal. The Dawn Princess MDR has a standard 5 course meal (with a small number of options for each course) and for no extra charge you can order extra serves. When you make it a 7 course meal the value increases. I really doubt that I could find any restaurant in Melbourne or Sydney that would serve a comparable meal for $119.

You could consider a cruise to be either paying for accommodation and getting everything else for free or to be paying for fine dining in the evening and getting everything else for free. Getting both for the price of one (along with entertainment etc) is a great deal!

I can recommend a cruise as a good holiday which is rather cheap if you do it right. That is if you want to spend lots of time swimming and eating quality food.

How Cruise Companies Make Money

There are economies of scale in running a restaurant, so having the MDR packed every night makes it a much more economic operation than a typical restaurant which has quiet nights. But the expenses in providing the services (which involves a crew that is usually almost half the number of passengers) are considerable. Paying $119 per night might cover half the wages of an average crew member but not much more.

The casino is one way that the cruise companies make money. I can understand that someone taking a luxury vacation might feel inclined to play blackjack or something else that seems sophisticated. But playing poker machines on a cruise ship is rather sad – not that I’m complaining, I’m happy for other people to subsidise my holidays!

Alcohol is rather expensive on board. Some cruise companies allow each passenger to take one bottle of wine and some passengers try to smuggle liquor on board. On the forums some passengers report that they budget to spend $1000 per week on alcohol! If I wanted a holiday that involved drinking that much I’d book a hotel at the beach, mix up a thermos full of a good cocktail in my hotel room, and then take my own deck-chair to the beach.

It seems that the cruise companies specialise in extracting extra money from passengers (I don’t think that my experience with the Dawn Princess is unusual in any way). Possibly the people who pay $1000 per night or more for a cruise don’t get the nickel-and-dime treatment, but for affordable cruises I think it’s standard. You have to be in the habit of asking the price whenever something is offered and be aware of social pressure to spend money.

When I boarded the Dawn Princess there was a queue, which I joined as everyone did. It turned out that the queue was to get a lanyard for holding the key-card (which opens the cabin door and is used for payment). After giving me the lanyard they then told me that it cost $7.95 – so I gave it back. Next time I’ll take a lanyard from some computer conference and use it to hold the key-card, it’s handy to have a lanyard but I don’t want to pay $7.95.

Finally some things are free at some times but not at others, fruit juice is free at the breakfast buffet but expensive at the lunch buffet. Coffee at the MDR is expensive but it was being served for free at a cafe on deck.

How to have a Cheap Cruise

VacationsToGo.com is the best discount cruise site I’ve found so far [2]. Unfortunately they don’t support searching on price, average daily price, or on a customised number of days (I can search for 7 days but not 7 or less). For one of the cheaper vessels it seems that anything less than $120 per night is a good deal and there are occasional deals as low as $70 per night.

Princess cruises allows each passenger to bring one bottle of wine on board. If you drink that in your cabin (to avoid corkage fees) then that can save some money on drinks. RumRunnerFlasks.com sells plastic vessels for smuggling liquor on board cruise ships [3]. I wouldn’t use one myself but many travelers recommend them highly.

Chocolate and other snack foods are quite expensive on board and there are no restrictions on bringing your own, so the cheap options are to bring your own snack food or to snack from the buffet (which is usually open 24*7). Non-alcoholic drinks can be expensive but you can bring your own and use the fridge in your cabin to store it, but you have to bring cans or pressurised bottles so it doesn’t look like you are smuggling liquor on board.

Generally try not to pay for anything on board, there’s enough free stuff if you make good choices.

Princess offers free on-board credit (money for buying various stuff on-board) for any cruise that you book while on a cruise. The OBC starts at $25 per person and goes as high as $150 per person depending on how expensive the cruise is. Generally booking cruises while on-board is a bad idea as you can’t do Internet searches. But as Princess apparently doesn’t allow people outside the US to book through a travel agent and as they only require a refundable deposit that is not specific to any particular cruise there seems no down-side. In retrospect I should have given them a $200 on the off chance that I’ll book another cruise with them some time in the next four years.

Princess provide a book of discount vouchers in every cabin, mostly this is a guide to what is most profitable for them – and thus what you should avoid if you want a cheap holiday. But there are some things that could be useful such as a free thermos cup with any cup of coffee – if you buy coffee then you might as well get the free cup. Also they have some free contests that might be worth entering.

Entertainment

It’s standard practice to have theatrical shows on board, some sort of musical is standard and common options include a magic show and comedy (it really depends on which cruise you take). On the Dawn Princess the second seating for dinner started at 8PM (the time apparently varies depending on the cruise schedule) which was the same time as the first show of the evening. I get the impression that this sort of schedule is common so if you want to see two shows in one night then you need to have the early seating for dinner. The cruise that I took lasted two nights and had two shows (a singing/dancing show and a magic show), so it was possible to have the late seating for dinner and still see all the main entertainment – unless you wanted to see one show twice.

From reading the CruiseCritic.com forum [4] I get the impression that the first seating for dinner is the most popular. On some cruises it’s easy to switch from first to second seating but not always possible to switch from second to first. Therefore the best strategy seems to be to book the first seating.

Things to do Before Booking a Cruise

Read the CruiseCritic.com forum for information about almost everything.

Compare prices for a wide variety of cruises to get a feel for what the best deals are. While $100 per night is a great deal for the type of cruise that interests me and is in my region it may not be a good match for the cruises that interest you.

Read overview summaries of cruise lines that operate in your area. Some cruise lines cater for particular age groups and interests and are thus unappealing to some people – EG anyone who doesn’t have children probably won’t be interested in Disney cruises.

Read reviews of the ships, there is usually a great variation between different ships run by one line. One factor is when the ships have been upgraded with recently developed luxury features.

Determine what things need to be booked in advance. Some entertainment options on board support a limited number of people and get booked out early. For example if you want to use the VR golf simulator on the Dawn Princess you should probably check in early and make a reservation as soon as you are on board. The forums are good for determining what needs to be booked early.

Also see my post about booking a cruise and some general discussion of cruise related things [5].

Syndicated 2012-01-08 11:03:26 from etbe - Russell Coker

DRBD Benchmarking

I’ve got some performance problems with a mail server that’s using DRBD so I’ve done some benchmark tests to try and improve things. I used Postal for testing delivery to an LMTP server [1]. The version of Postal I released a few days ago had a bug that made LMTP not work, I’ll release a new version to fix that next time I work on Postal – or when someone sends me a request for LMTP support (so far no-one has asked for LMTP support so I presume that most users don’t mind that it’s not yet working).

The local spool on my test server is managed by Dovecot, the Dovecot delivery agent stores the mail and the Dovecot POP and IMAP servers provide user access. For delivery I’m using the LMTP server I wrote which has been almost ready for GPL release for a couple of years. All I need to write is a command-line parser to support delivery options for different local delivery agents. Currently my LMTP server is hard-coded to run /usr/lib/dovecot/deliver and has it’s parameters hard-coded too. As an aside if someone would like to contribute some GPL C/C++ code to convert a string like “/usr/lib/dovecot/deliver -e -f %from% -d %to% -n” into something that will populate an argv array for execvp() then that would be really appreciated.

Authentication is to a MySQL server running on a fast P4 system. The MySQL server was never at any fraction of it’s CPU or disk IO capacity so using a different authentication system probably wouldn’t have given different results. I used MySQL because it’s what I’m using in production. Apart from my LMTP server and the new version of Postal all software involved in the testing is from Debian/Squeeze.

The Tests

All tests were done on a 20G IDE disk. I started testing with a Pentium-4 1.5GHz system with 768M of RAM but then moved to a Pentium-4 2.8GHz system with 1G of RAM when I found CPU time to be a bottleneck with barrier=0. All test results are for the average number of messages delivered per minute for a 19 minute test run where the first minute’s results are discarded. The delivery process used 12 threads to deliver mail.

P4-1.5 p4-2.8
Default Ext4 1468 1663
Ext4 max_batch_time=30000 1385 1656
Ext4 barrier=0 1997 2875
Ext4 on DRBD no secondary 1810 2409

When doing the above tests the 1.5GHz system was using 100% CPU time when the filesystem was mounted with barrier=0, about half of that was for system (although I didn’t make notes at the time). So the testing on the 1.5GHz system showed that increasing the Ext4 max_batch_time number doesn’t give a benefit for a single disk, that mounting with barrier=0 gives a significant performance benefit, and that using DRBD in disconnected mode gives a good performance benefit through forcing barrier=0. As an aside I wonder why they didn’t support barriers on DRBD given all the other features that they have for preserving data integrity.

The tests with the 2.8GHz system demonstrate the performance benefits of having adequate CPU power, as an aside I hope that Ext4 is optimised for multi-core CPUs because if a 20G IDE disk needs a 2.8GHz P4 then modern RAID arrays probably require more CPU power than a single core can provide.

It’s also interesting to note that a degraded DRBD device (where the secondary has never been enabled) only gives 84% of the performance of /dev/sda4 when mounted with barrier=0.

p4-2.8
Default Ext4 1663
Ext4 max_batch_time=30000 1656
Ext4 min_batch_time=15000,max_batch_time=30000 1626
Ext4 max_batch_time=0 1625
Ext4 barrier=0 2875
Ext4 on DRBD no secondary 2409
Ext4 on DRBD connected C 1575
Ext4 on DRBD connected B 1428
Ext4 on DRBD connected A 1284

Of all the options for batch times that I tried it seemed that every change decreased the performance slightly but as the greatest decrease in performance was only slightly more than 2% it doesn’t matter much.

One thing that really surprised me was the test results from different replication protocols. The DRBD replication protocols are documented here [2]. Protocol C is fully synchronous – a write request doesn’t complete until the remote node has it on disk. Protocol B is memory synchronous, the write is complete when it’s on a local disk and in RAM on the other node. Protocol A is fully asynchronous, a write is complete when it’s on a local disk. I had expected protocol A to give the best performance as it has lower latency for critical write operations and for protocol C to be the worst. My theory is that DRBD has a performance bug for the protocols that the developers don’t recommend.

One other thing I can’t explain is that according to iostat the data partition on the secondary DRBD node had almost 1% more sectors written than the primary and the number of writes was more than 1% greater on the secondary. I had hoped that with protocol A the writes would be combined on the secondary node to give a lower disk IO load.

I filed Debian bug report #654206 about the kernel not exposing the correct value for max_batch_time. The fact that no-one else has reported that bug (which is in kernels from at least 2.6.32 to 3.1.0) is an indication that not many people have found it useful.

Conclusions

When using DRBD use protocol C as it gives better integrity and better performance.

Significant CPU power is apparently required for modern filesystems. The fact that a Maxtor 20G 7200rpm IDE disk [3] can’t be driven at full speed by a 1.5GHz P4 was a surprise to me.

DRBD significantly reduces performance when compared to a plain disk mounted with barrier=0 (for a fair comparison). The best that DRBD could do in my tests was 55% of native performance when connected and 84% of native performance when disconnected.

When comparing a cluster of cheap machines running DRBD on RAID-1 arrays to a single system running RAID-6 with redundant PSUs etc the performance loss from DRBD is a serious problem that can push the economic benefit back towards the single system.

Next I will benchmark DRBD on RAID-1 and test the performance hit of using bitmaps with Linux software RAID-1.

If anyone knows how to make a HTML table look good then please let me know. It seems that the new blog theme that I’m using prevents borders.

Update:

I mentioned my Debian bug report about the mount option and the fact that it’s all on Debian/Squeeze.

Syndicated 2012-01-05 08:31:32 from etbe - Russell Coker

DRBD Benchmarking

I’ve got some performance problems with a mail server that’s using DRBD so I’ve done some benchmark tests to try and improve things. I used Postal for testing delivery to an LMTP server [1]. The version of Postal I released a few days ago had a bug that made LMTP not work, I’ll release a new version to fix that next time I work on Postal – or when someone sends me a request for LMTP support (so far no-one has asked for LMTP support so I presume that most users don’t mind that it’s not yet working).

The local spool on my test server is managed by Dovecot, the Dovecot delivery agent stores the mail and the Dovecot POP and IMAP servers provide user access. For delivery I’m using the LMTP server I wrote which has been almost ready for GPL release for a couple of years. All I need to write is a command-line parser to support delivery options for different local delivery agents. Currently my LMTP server is hard-coded to run /usr/lib/dovecot/deliver and has it’s parameters hard-coded too. As an aside if someone would like to contribute some GPL C/C++ code to convert a string like “/usr/lib/dovecot/deliver -e -f %from% -d %to% -n” into something that will populate an argv array for execvp() then that would be really appreciated.

Authentication is to a MySQL server running on a fast P4 system. The MySQL server was never at any fraction of it’s CPU or disk IO capacity so using a different authentication system probably wouldn’t have given different results. I used MySQL because it’s what I’m using in production.

The Tests

All tests were done on a 20G IDE disk. I started testing with a Pentium-4 1.5GHz system with 768M of RAM but then moved to a Pentium-4 2.8GHz system with 1G of RAM when I found CPU time to be a bottleneck with barrier=0. All test results are for the average number of messages delivered per minute for a 19 minute test run where the first minute’s results are discarded. The delivery process used 12 threads to deliver mail.

P4-1.5 p4-2.8
Default Ext4 1468 1663
Ext4 max_batch_time=30000 1385 1656
Ext4 barrier=0 1997 2875
Ext4 on DRBD no secondary 1810 2409

When doing the above tests the 1.5GHz system was using 100% CPU time when the filesystem was mounted with barrier=0, about half of that was for system (although I didn’t make notes at the time). So the testing on the 1.5GHz system showed that increasing the Ext4 max_batch_time number doesn’t give a benefit for a single disk, that mounting with barrier=0 gives a significant performance benefit, and that using DRBD in disconnected mode gives a good performance benefit through forcing barrier=0. As an aside I wonder why they didn’t support barriers on DRBD given all the other features that they have for preserving data integrity.

The tests with the 2.8GHz system demonstrate the performance benefits of having adequate CPU power, as an aside I hope that Ext4 is optimised for multi-core CPUs because if a 20G IDE disk needs a 2.8GHz P4 then modern RAID arrays probably require more CPU power than a single core can provide.

It’s also interesting to note that a degraded DRBD device (where the secondary has never been enabled) only gives 84% of the performance of /dev/sda4 when mounted with barrier=0.

p4-2.8
Default Ext4 1663
Ext4 max_batch_time=30000 1656
Ext4 min_batch_time=15000,max_batch_time=30000 1626
Ext4 max_batch_time=0 1625
Ext4 barrier=0 2875
Ext4 on DRBD no secondary 2409
Ext4 on DRBD connected C 1575
Ext4 on DRBD connected B 1428
Ext4 on DRBD connected A 1284

Of all the options for batch times that I tried it seemed that every change decreased the performance slightly but as the greatest decrease in performance was only slightly more than 2% it doesn’t matter much.

One thing that really surprised me was the test results from different replication protocols. The DRBD replication protocols are documented here [2]. Protocol C is fully synchronous – a write request doesn’t complete until the remote node has it on disk. Protocol B is memory synchronous, the write is complete when it’s on a local disk and in RAM on the other node. Protocol A is fully asynchronous, a write is complete when it’s on a local disk. I had expected protocol A to give the best performance as it has lower latency for critical write operations and for protocol C to be the worst. My theory is that DRBD has a performance bug for the protocols that the developers don’t recommend.

One other thing I can’t explain is that according to iostat the data partition on the secondary DRBD node had almost 1% more sectors written than the primary and the number of writes was more than 1% greater on the secondary. I had hoped that with protocol A the writes would be combined on the secondary node to give a lower disk IO load.

Conclusions

When using DRBD use protocol C as it gives better integrity and better performance.

Significant CPU power is apparently required for modern filesystems. The fact that a Maxtor 20G 7200rpm IDE disk [3] can’t be driven at full speed by a 1.5GHz P4 was a surprise to me.

DRBD significantly reduces performance when compared to a plain disk mounted with barrier=0 (for a fair comparison). The best that DRBD could do in my tests was 55% of native performance when connected and 84% of native performance when disconnected.

When comparing a cluster of cheap machines running DRBD on RAID-1 arrays to a single system running RAID-6 with redundant PSUs etc the performance loss from DRBD is a serious problem that can push the economic benefit back towards the single system.

Next I will benchmark DRBD on RAID-1 and test the performance hit of using bitmaps with Linux software RAID-1.

If anyone knows how to make a HTML table look good then please let me know. It seems that the new blog theme that I’m using prevents borders.

Syndicated 2012-01-05 07:31:32 from etbe - Russell Coker

Autism and a Child Beauty Contest

Fenella Wagener wrote an article for the Herald Sun about an Autistic girl who won the “best personality” award from the controversial new Australian children’s beauty pageant [1]. The girl’s mother is complaining that an Autistic girl shouldn’t win a prize for personality and is critizing the pageant organisers.

A beauty contest involves queuing, being quiet, appearing on stage, wearing cosmetics and unusual/uncomfortable clothes. It probably also involves having someone else assist with dressing and applying cosmetics (being touched by another person). These are all things which tend to be difficult or impossible for Autistic kids. So any girl who can get on stage wearing make-up can probably do whatever is required to avoid being obviously excluded from a personality prize. As any such prize has to be largely subjective I don’t think it would ever be possible to prove that someone was the correct choice for the winner, it would merely be possible to prove that some candidates excluded themselves.

But whether the girl deserved to win isn’t the real issue here. I think that beauty pageants should be restricted to adults, merely entering a child in such a contest is bad enough, but making nasty public statements about a child is horrible. If other children made a Facebook page claiming that the girl in question didn’t deserve to win a “best personality” prize it would probably be reported as cyber-bullying. I don’t think that publishing the name or photo of the girl in question is in the “public interest” either. Many news sites that have picked up the story have shown the same lack of journalistic ethics so now the girl has some high traffic sites with her name linked to this story, it seems unlikely that anything good she might do in the near future will get a higher ranking for her name in search engines. So any time she searches for her name on Google (which most people do regularly) she will be reminded that her mother thinks she has some sort of defective personality because she is Autistic.

High school is generally bad for almost everyone on the Autism Spectrum. Presumably any parent who would abuse their child by allowing such an article to be published would also send them to a regular school (as opposed to Home Schooling which is probably the only good option for Autistic kids in Australia). I’m sure that the standard practice at every high school nowadays is that the kids all use Google to discover things to tease each other about. So in a few years the Herald Sun article will probably be the basis of a high school bullying campaign.

The girl in question is only 9, so she’s got another 6 or 7 years before she can legally leave her mother. In Australia 16 is the minimum legal age to live without parents and the police won’t forcibly return “runaway” children who are almost 16.

The Journalistic Code of Ethics

Here is a link to the Australian Media Alliance code of Journalistic Ethics [2]. Section 8 includes “Never exploit a person’s vulnerability or ignorance of media practice“. I think that publishing the name and photograph of a 9yo girl in a way that is likely to lead to bullying in a few years is a clear example of exploiting a vulnerable person.

The code of ethics has a guidance clause which says “Only substantial advancement of the public interest or risk of substantial harm to people allows any standard to be overridden“. Even if it was a proven fact that a beauty pageant was issuing awards to unqualified children there would not be any substantial advancement of the public interest in publishing that.

Beauty Contests are Evil

The Australian has an article about the same beauty contest by Caroline Overington which quotes adolescent and child psychotherapist Collett Smart calling for government intervention [3].

Catherine Manning has written a good article explaining some of the reasons for opposing child beauty pageants [4].

The American Psychological Association has published a report on the Sexualization of Girls [5], they have lots of references to psychological research which gives a variety of reasons for opposing child beauty contests. IMHO each of the reasons alone should be sufficient to convince people that child beauty pageants are bad.

Finally the pictures of contestants who are less than 10yo but made up to look like they are 20+ are rather disturbing.

Syndicated 2012-01-04 06:17:18 from etbe - Russell Coker

937 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!