<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Advogato blog for lmb</title>
    <link>http://www.advogato.org/person/lmb/</link>
    <description>Advogato blog for lmb</description>
    <language>en-us</language>
    <generator>mod_virgule</generator>
    <pubDate>Wed, 19 Jun 2013 21:44:03 GMT</pubDate>
    <item>
      <pubDate>Wed, 9 Feb 2011 13:44:38 GMT</pubDate>
      <title>9 Feb 2011</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=110</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=110</guid>
      <description>&lt;p&gt;&lt;strong&gt;There is such a thing as a free lunch!&lt;/strong&gt;&#xD;
&lt;p&gt;A &lt;A HREF="http://lists.linux-ha.org/pipermail/linux-ha-dev/2011-February/018221.html"&gt;current&#xD;
discussion&lt;/a&gt; reminded me that, while certain parts of our&#xD;
cluster stack do have comprehensive regression tests, not&#xD;
all parts do. However, test coverage is crucial: untested&#xD;
code is broken code, and clean-ups and re-factoring become a&#xD;
risky business, which impedes adding features and&#xD;
maintainability.&#xD;
&lt;p&gt;Code that is tested can be cleaned up with confidence;&#xD;
new features can be added safely, trusting that old&#xD;
functionality is not broken. Good tests make for good sleep.&#xD;
&lt;p&gt;We need more, and better, tests for all aspects of our&#xD;
cluster stack; functional and non-functional both. If I had&#xD;
my test-driven way, I'd veto&#xD;
every contribution that came without sufficient tests, but&#xD;
that's a heavy obligation to place on contributors.&#xD;
Providing tests ought to be perceived as a positive task,&#xD;
not as a burden.&#xD;
&lt;p&gt;We need the awareness that meaningful tests are good all&#xD;
by themselves, and that not just feature work is cool. Tests&#xD;
ensure quality, inspire confidence, reduce risk for release&#xD;
managers and contributors, lead to more modular and&#xD;
maintainable code, and allow you to point out errors other&#xD;
contributors make - what more could you ask for?&#xD;
&lt;p&gt;Hence, I am announcing the &lt;strong&gt;&lt;i&gt;Almost&lt;/i&gt; Free Lunch&#xD;
initiative&lt;/strong&gt;: contribute an Open Source, reasonably&#xD;
comprehensive and non-trivial test suite for one of our&#xD;
components, document it so that people can actually run it&#xD;
;-), and &lt;strong&gt;I will invite you to lunch!&lt;/strong&gt; (I&#xD;
would offer to sing songs praising the glory of you,&#xD;
but that would be rather counterproductive.)&#xD;
&lt;p&gt;The obvious place to start with is the one which started&#xD;
this discussion: our resource agent test coverage really&#xD;
needs more love. But you will find all projects - Linux-HA,&#xD;
Pacemaker, corosync, OCFS2, GFS2, ... - more than willing to&#xD;
provide you with hints where our tests can be improved.&#xD;
Contact me, or any of the projects' mailing lists, to learn&#xD;
more.</description>
    </item>
    <item>
      <pubDate>Mon, 18 Oct 2010 13:02:05 GMT</pubDate>
      <title>18 Oct 2010</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=109</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=109</guid>
      <description>&lt;b&gt;Linux Magazin Artikel zu Pacemaker, OCFS2 und DRBD&lt;/b&gt;&#xD;
&lt;p&gt;Dear international readers, what follows is a critique of&#xD;
a German language article, and hence the rest of this post&#xD;
will also be in German.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; &lt;p&gt;Selbstverst&amp;auml;ndlich habe ich mich sehr gefreut,&#xD;
in der&#xD;
Ausgabe 11/2010 zu diesen Projekten eine Setup-Guide zu&#xD;
lesen, noch dazu auf Basis von openSUSE; alles Themen und&#xD;
Projekte, die mir sehr am Herz liegen.&#xD;
&lt;p&gt;Jedoch hat mich der Artikel fachlich sehr entt&amp;auml;uscht.&#xD;
&lt;p&gt;Im einzelnen meine Kritikpunkte:&#xD;
&lt;ul&gt;&#xD;
&lt;li&gt;&lt;p&gt;Im Artikel wird ein Active/Passive Fail-over f&amp;uuml;r&#xD;
einen&#xD;
LAMP-Stack konfiguriert. In diesem Fall ist OCFS2, genau wie&#xD;
DRBD's Active/Active Mode, fehl am Platz - DRBD sollte&#xD;
ebenfalls in eine Active/Passive ("Single Primary")&#xD;
Konfiguration betrieben werden.&#xD;
&lt;li&gt;&lt;p&gt;Wenn schon OCFS2 zum Einsatz kommt, so sollte in jedem&#xD;
Fall OCFS2 unter der Kontrolle von Pacemaker und Corosync&#xD;
gestartet werden, und nicht via Init-Scripten und&#xD;
/etc/fstab. Ansonsten steht zum Beispiel vollst&amp;auml;ndiges&#xD;
POSIX&#xD;
Locking nicht zur Verf&amp;uuml;gung; desweiteren kann die&#xD;
Konfiguration von /etc/ocfs2/cluster.conf entfallen, weil&#xD;
diese Informationen automatisch von Corosync &amp;uuml;bernommen&#xD;
werden.&#xD;
&lt;p&gt;Gleiches gilt nat&amp;uuml;rlich auch f&amp;uuml;r DRBD: auch&#xD;
dieser Dienst&#xD;
sollte von Pacemaker gesteuert - und somit auch &amp;uuml;berwacht -&#xD;
werden. Nur so steht die volle Funktionalit&amp;auml;t von allen&#xD;
Cluster Komponenten und ihr Zusammenspiel sicher gestellt&#xD;
werden.&#xD;
&lt;li&gt;&lt;p&gt;Auch beschreibt der Artikel, ohne in irgendeiner Form&#xD;
die Konsequenzen dessen zu diskutieren, die Deaktivierung&#xD;
des IO-Fencing-Mechanismus&#xD;
"STONITH". Dadurch k&amp;ouml;nnen Daten-Diskrepanzen auftreten.&#xD;
&lt;li&gt;&lt;p&gt;G&amp;auml;nzlich entsetzt war ich von dem "empfohlenen"&#xD;
Wrapper,&#xD;
der LSB Scripte "cluster-tauglich" machen soll. Nicht nur,&#xD;
dass der Cluster-Stack selbstverst&amp;auml;ndlich eine&#xD;
M&amp;ouml;glichkeit&#xD;
zur Einbindung von LSB Scripts mitbringt (via der Resource&#xD;
Class "LSB"), sondern das &lt;A HREF="http://www.medozas.de/cluster.initd"&gt;referenzierte&#xD;
Script&lt;/a&gt; ist auch noch fundamental kaputt - es wartet&#xD;
nicht, dass der Dienst wirklich gestartet oder gestoppt&#xD;
wurde, es gibt falsche Metadaten aus, und die&#xD;
R&amp;uuml;ckgabe-Werte&#xD;
der Status- und Monitor-Operationen sind fehlerhaft.&#xD;
&lt;p&gt;Und dann wird dieses defekete Wrapper-Script auch noch&#xD;
f&amp;uuml;r Dienste verwendet, f&amp;uuml;r die der Cluster Stack&#xD;
&lt;i&gt;selbstverst&amp;auml;ndlich&lt;/i&gt; vollst&amp;auml;ndige OCF&#xD;
Resource Agents&#xD;
mitbringt - n&amp;auml;mlich Apache und MySQL.&#xD;
&lt;li&gt;&lt;p&gt;Die Konfiguration des Clusters k&amp;ouml;nnte durch&#xD;
Verwendung&#xD;
einer Resource Group, anstatt von drei Abh&amp;auml;ngigkeiten,&#xD;
ebenfalls gestrafft werden.&#xD;
&lt;li&gt;&lt;p&gt;Geschwiegen sei davon, dass bei Ausfall eines Systems das&#xD;
andere, so wie in diesem Artikel angek&amp;uuml;ndigt, eben nicht&#xD;
&amp;uuml;bernehmen w&amp;uuml;rde, da die &lt;i&gt;no-quorum-policy&lt;/i&gt; nicht&#xD;
gesetzt wird.&#xD;
&lt;li&gt;&lt;p&gt;Ebenfalls wird verzichtet, auf Grundlagen eines&#xD;
redundanten&#xD;
Systems einzugehen: so wird nicht einmal eine unbedingt&#xD;
notwendige&#xD;
redundante Netzwerk-Anbindung konfiguriert, noch empfohlen.&#xD;
&lt;li&gt;&lt;p&gt;Das im Detail Markennamen - openSUSE, SLE 11 HAE - falsch&#xD;
geschrieben sind, ist dann nur noch das i-T&amp;uuml;pfelchen.&#xD;
&lt;/ul&gt;&#xD;
&lt;p&gt;Es f&amp;auml;llt mir schwer, einen solchen Artikel&#xD;
konstruktiv zu&#xD;
kritisieren; werter Autor, lieber Lektor: &lt;strong&gt;das geht&#xD;
so nicht!&lt;/strong&gt;&#xD;
&#xD;
</description>
    </item>
    <item>
      <pubDate>Tue, 10 Aug 2010 20:45:35 GMT</pubDate>
      <title>10 Aug 2010</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=108</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=108</guid>
      <description>&lt;p&gt;&lt;b&gt;On selecting good timeouts&lt;/b&gt;&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;Timeouts are a common design choice or implementation&#xD;
detail in any&#xD;
computer system, but are in particular popular in&#xD;
High-Availability&#xD;
clusters&#xD;
(such as those build with the &lt;A HREF="http://www.novell.com/products/highavailability/"&gt;SUSE&#xD;
Linux&#xD;
High-Availability Extension&lt;/a&gt; and other stacks that are&#xD;
similarly&#xD;
based on corosync and pacemaker).&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;They are seemingly straightforward to detect faults: if&#xD;
the task&#xD;
doesn't complete within N seconds, it is considered failed,&#xD;
and recovery&#xD;
attempted. (The task could be anything from a network messaging&#xD;
protocol, a database starting under the cluster's control,&#xD;
any IO, and a&#xD;
number of other cases.)&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;However, selecting a good value for the timeout is less&#xD;
straightforward than it may seem; more often than not, they&#xD;
are much too&#xD;
short. This seems to stem from the belief that a fast&#xD;
response to&#xD;
failures is &lt;i&gt;unconditionally&lt;/i&gt; a good thing: the system&#xD;
will perform&#xD;
better if timeouts are shorter. This is not quite true,&#xD;
though.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;To illustrate, assume two scenarios:&#xD;
&lt;ol&gt;&lt;li&gt;First, that the system has failed in such a way that it&#xD;
will not respond with a failed response to a monitor task&#xD;
immediately,&#xD;
but instead runs indefinitely unless aborted by the&#xD;
timeout.&#xD;
&lt;li&gt;Second, that the system is operating fine, but&#xD;
experiencing a brief&#xD;
period of stress, where responses are delayed, just to the&#xD;
edge of the&#xD;
timeout value.&#xD;
&lt;/ol&gt;&#xD;
&lt;p&gt;Now, let us explore the impact of a timeout that is one&#xD;
second&#xD;
"too long"; and then, one that is one second "too short".&#xD;
&lt;p&gt;For a too long timeout, the failure in the first scenario&#xD;
is detected&#xD;
one second later, adding one second to the recovery time. In&#xD;
the second&#xD;
scenario, no timeout occurs, and the system continues as&#xD;
normal.&#xD;
&lt;p&gt;For the too short timeout, the first scenario is&#xD;
recovered one second&#xD;
faster; the second scenario causes an &lt;i&gt;unnecessary&#xD;
recovery&lt;/i&gt;,&#xD;
probably incurring a &lt;i&gt;real&lt;/i&gt; service outage in the&#xD;
attempt to restart the&#xD;
application, or at least a brief period without service!&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;Another problem arises from how timeouts are often&#xD;
chosen; of course,&#xD;
if they were &lt;i&gt;obviously&lt;/i&gt; too short, administrators would&#xD;
immediately notice, since their system would never get off&#xD;
the ground at&#xD;
all, but immediately start spewing errors. Instead, the&#xD;
timeouts are&#xD;
usually adequate for the &lt;i&gt;tested scenario&lt;/i&gt; (note that&#xD;
you can use&#xD;
the pacemaker monitoring tools to look at the actual runtime of&#xD;
operations); if your test load exceeds the load of your live&#xD;
system,&#xD;
raise your hand - more often than not, it does not.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;Under a stress/peak load, the system response tends to&#xD;
degenerate&#xD;
exponentially; it will not just slow down by ten percent,&#xD;
but by thirty.&#xD;
If this scenario gets treated as a failure, the likelihood&#xD;
that the&#xD;
fail-over system will experience the same level of stress is&#xD;
high;&#xD;
worse, requests may have queued up, and if - due to the&#xD;
stress, remember&#xD;
- the system did not shutdown cleanly, an&#xD;
application-internal recovery&#xD;
phase will compound the effect.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;Monitoring application performance for load-distribution&#xD;
is quite a&#xD;
different task from monitoring application correctness. The&#xD;
former is&#xD;
important, and a performance degradation may also imply&#xD;
violation of&#xD;
service level agreements; however, initiating recovery&#xD;
through restart&#xD;
is unlikely to alleviate the problem. (In a pacemaker&#xD;
cluster, this&#xD;
would best be monitored externally and fed into the utilization&#xD;
constraints of the resources and nodes.)&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;In summary, a too short timeout is the worse choice;&#xD;
rather, it is&#xD;
safer to make hard timeouts large enough beyond reasonable&#xD;
doubt. Yes,&#xD;
it will slow down the fail-over and recovery slightly, but&#xD;
at least not&#xD;
cause them by mistake.&#xD;
&#xD;
&lt;p&gt; &lt;P&gt;(For a rather excellent and exhaustive treatment of this&#xD;
subject&#xD;
matter, see &lt;A HREF="http://www.informatik.hu-berlin.de/~wolter/publications/habil.pdf"&gt;K.&#xD;
Wolter, &amp;ldquo;Stochastic Models for Fault Tolerance: Restart,&#xD;
Rejuvenation&#xD;
and Checkpointing,&amp;rdquo; Habilitation Thesis, Humboldt-University,&#xD;
2007&lt;/a&gt;.)&#xD;
</description>
    </item>
    <item>
      <pubDate>Mon, 19 Jul 2010 22:25:42 GMT</pubDate>
      <title>19 Jul 2010</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=107</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=107</guid>
      <description>&lt;p&gt;It has been a while since I took the chance to blog here;&#xD;
the time has been pretty packed with shipping SUSE Linux&#xD;
Enterprise 11 Service-Pack 1's High-Availability Extension&#xD;
(or SLE HA 11 SP1 for short ;-), and supporting the first&#xD;
deployments.&#xD;
&lt;p&gt;It is a good time to look back and review the very&#xD;
awesome new features that the community developed along with&#xD;
us, and that we are shipping as Enterprise-ready now.&#xD;
&lt;p&gt;A feature that I am personally very impressed by is the&#xD;
OCFS2 reflink feature; basically, OCFS2 cracked the hard nut&#xD;
of cluster-wide copy-on-write snapshots, which LVM2 has been&#xD;
trying to for years. This allows space-efficient and very&#xD;
fast provisioning of new VMs, snapshots for backup, cloning&#xD;
from templates, cloning from clones, etcetera; it really is&#xD;
amazing.&#xD;
&lt;p&gt;For those of you who prefer a visual, the team from NGN&#xD;
taped &lt;A HREF="http://www.ngn.nl/ngn?waxid=424189285"&gt;a&#xD;
video with me&lt;/a&gt; being interviewed by Sander at Novell's&#xD;
BrainShare in Amsterdam; this&#xD;
is my first video interview ever!&#xD;
&lt;p&gt;In case you would like an audio-only review, Ron and&#xD;
terry &lt;A HREF="http://www.novell.com/feeds/openaudio/?p=350"&gt;interviewed&#xD;
me for Novell Open Audio&lt;/a&gt; as well.&#xD;
&lt;p&gt;I hope you find them informative - if so, please spread them,&#xD;
and let me know your feedback.</description>
    </item>
    <item>
      <pubDate>Wed, 31 Mar 2010 09:39:17 GMT</pubDate>
      <title>31 Mar 2010</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=106</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=106</guid>
      <description>&lt;p&gt;My colleague &lt;A HREF="http://au.linkedin.com/in/tserong"&gt;Tim&lt;/a&gt; has drawn&#xD;
&lt;A HREF="http://ourobengr.com/stonith-story"&gt;awesome&#xD;
cartoons&lt;/a&gt; to illustrate my last cluster zombie story on&#xD;
&lt;A HREF="http://advogato.org/person/lmb/diary/105.html"&gt;why&#xD;
you need STONITH (node fencing)&lt;/a&gt;. Clusters and the&#xD;
undead, I spot an upcoming theme for my stories ...</description>
    </item>
    <item>
      <pubDate>Tue, 30 Mar 2010 10:50:02 GMT</pubDate>
      <title>30 Mar 2010</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=105</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=105</guid>
      <description>&lt;p&gt;&lt;big&gt;&lt;b&gt;Why you need STONITH&lt;/b&gt;&lt;/big&gt;&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; &lt;p&gt; &lt;p&gt;A very common fallacy when setting up&#xD;
High-Availability&#xD;
clusters - be it on Pacemaker + corosync, Linux-HA, RedHat&#xD;
Cluster Suite, or else - is thinking that your setup,&#xD;
despite all the warnings in the documentation or in the&#xD;
logfiles, does not require node fencing.&#xD;
&lt;p&gt;&lt;big&gt;What is node fencing?&lt;/big&gt;&#xD;
&lt;p&gt;Fencing is a mechanism by which the&#xD;
"surviving" nodes in the cluster make sure that the node(s)&#xD;
that have been evicted from the cluster are truly gone. This&#xD;
is also referred to as node isolation, or, in a very&#xD;
descriptive metaphor, STONITH ("Shoot the other node in the&#xD;
head"). This mechanism is not just "fire and forget", but&#xD;
the cluster software will wait for a positive confirmation&#xD;
from it before proceeding with resource recovery.&#xD;
&lt;p&gt;But it has &lt;i&gt;already&lt;/i&gt; failed, otherwise it would not&#xD;
have been evicted, so why would this be necessary, you ask?&#xD;
&lt;p&gt;The key here is the distinction between&#xD;
&lt;i&gt;appearances&lt;/i&gt; and &lt;i&gt;reality&lt;/i&gt;: a complete loss of&#xD;
communication with a node looks to all other nodes as if the&#xD;
node has disappeared. Since you, like the obedient&#xD;
administrator that you are, have configured redundant&#xD;
network links, the chance for this to happen is really slim,&#xD;
right? But that is not the only possible cause. In fact, it&#xD;
might still be around, just waiting to come out of a kernel&#xD;
hang, or hiding behind firewall rules, to spew a bunch of&#xD;
corrupted data to your shared state.&#xD;
&lt;p&gt;In short, node fencing/isolation/STONITH ensures the&#xD;
integrity of your shared state by turning a mere, if&#xD;
justified, &lt;i&gt;suspicion&lt;/i&gt; into&#xD;
&lt;b&gt;confirmed reality&lt;/b&gt;.&#xD;
&lt;p&gt;(Pacemaker clusters also use this mechanism for escalated&#xD;
error recovery; if Pacemaker has instructed a node to&#xD;
release a service (by stopping it), but that operation&#xD;
fails, the service is essentially "stuck" on that node. The&#xD;
semantics of the "stop" operation mandate that it must not&#xD;
fail, so this indicates a more fundamental problem on that&#xD;
node. Hence, the default process then would be to stop all&#xD;
other resources on that node, move them elsewhere, and fence&#xD;
the node - rebooting it tends to be rather effective at&#xD;
stopping anything that might have been stuck. This can be&#xD;
disabled per-resource if you don't want some low-priority&#xD;
failure to shift high-priority resources around, though.)&#xD;
&lt;p&gt;This is all very technical. So let me tell you a story&#xD;
with several possible endings to illustrate.&#xD;
&lt;p&gt;&lt;big&gt;Story time!&lt;/big&gt;&#xD;
&lt;p&gt;&lt;i&gt;Once upon a time,&lt;/i&gt; three friends were sitting&#xD;
huddled around a fire, peacefully eating their cookies. It&#xD;
was a tough time: the world was out to get them, a zombie&#xD;
infection was spreading, they couldn't trust anyone outside&#xD;
their trusted cluster of friends. They were always watchful&#xD;
and paid attention to each other.&#xD;
&lt;p&gt;Suddenly, one of the three stops responding to the&#xD;
conversation they were having. How do you proceed?&#xD;
&lt;ol&gt;&#xD;
&lt;li&gt;&lt;i&gt;My cluster of friends does not require such a crude&#xD;
mechanism! He'll be careful not to have been infected! If he&#xD;
stops responding, he will simply be dead!&lt;/i&gt; You ignore the&#xD;
problem, but then your former friend revives, spreads his&#xD;
infection to your cookie stack, starts clobbering you with a&#xD;
club to &lt;b&gt;eat your brains&lt;/b&gt;, and his howl gives away your&#xD;
location to all his new friends, who come down on you with&#xD;
the intent of &lt;b&gt;eating your brains&lt;/b&gt;.&#xD;
&lt;li&gt;&lt;i&gt;You use an unloaded gun to shoot your friend - the&#xD;
trigger responds reassuringly.&lt;/i&gt; Your former friends&#xD;
revives, and it is all about &lt;b&gt;eating your brains&lt;/b&gt;&#xD;
again.&#xD;
&lt;li&gt;&lt;i&gt;You kindly tap your friend on the shoulder, and&#xD;
suggest that he please commit suicide.&lt;/i&gt; Your former&#xD;
friend revives, snaps at your tapping hand, and starts&#xD;
&lt;b&gt;eating your brains&lt;/b&gt;.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; &lt;p&gt; &lt;li&gt;&lt;i&gt;You speak a pre-agreed upon code word, a tiny&#xD;
bomb&#xD;
goes off in the head of your friend, blows &lt;b&gt;his&lt;/b&gt; brains&#xD;
out, and he drops on the spot.&lt;/i&gt; The grue does not eat&#xD;
you. (In fact, the mechanism monitoring his brain probably&#xD;
has already blown him up, but you speak the code word anyway&#xD;
to make sure.)&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; &lt;p&gt; &lt;li&gt;&lt;i&gt;You take that crude, trusty shotgun and blow&#xD;
&lt;b&gt;his&lt;/b&gt; brains out, aiming away from the stack of&#xD;
cookies.&lt;/i&gt; The grue does not eat you.&#xD;
&lt;/ol&gt;&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; &lt;p&gt; &lt;p&gt;&lt;big&gt;So what?&lt;/big&gt;&#xD;
&lt;p&gt;In order, we have gone through the &lt;i&gt;"I do not need&#xD;
STONITH or have disabled it"&lt;/i&gt;, &lt;i&gt;"I used the null&#xD;
mechanism intended only for testing"&lt;/i&gt;, &lt;i&gt;"I used an&#xD;
ssh-based mechanism"&lt;/i&gt;, or the recommended &lt;i&gt;"a&#xD;
poison-pill mechanism with hardware watchdog support" (such&#xD;
as external/sbd in Pacemaker environments)&lt;/i&gt; and the&#xD;
time-tested &lt;i&gt;"talk to a network power switch, management&#xD;
board etc to cut the power"&lt;/i&gt; methods.&#xD;
&lt;p&gt;Pacemaker's escalated error recovery could be likened to&#xD;
&lt;i&gt;your friend telling you that despite his best attempts,&#xD;
his wound has become infected (and he can't bring himself to&#xD;
cut off his hand); he bravely gives away his&#xD;
equipment to you, kneels down, says goodbye, and you blow&#xD;
&lt;b&gt;his&lt;/b&gt; brains out.&lt;/i&gt;&#xD;
&#xD;
&lt;p&gt; &lt;p&gt; &lt;p&gt; &lt;p&gt;Does that drive the point home? How would you like to&#xD;
survive armageddon? Of course, it is always possible that&#xD;
you have a secret liking for becoming a zombie, and&#xD;
crumbling (instead of eating) all your cookies.&#xD;
&lt;p&gt;In this case, talk to your two friends about&#xD;
&lt;b&gt;appropriate&lt;/b&gt; therapy.</description>
    </item>
    <item>
      <pubDate>Thu, 29 Oct 2009 11:02:56 GMT</pubDate>
      <title>29 Oct 2009</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=104</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=104</guid>
      <description>&lt;p&gt;Again a tip on how to write your OpenAIS/Pacemaker&#xD;
configuration in a simpler fashion; this applies to &lt;A HREF="http://www.novell.com/products/highavailability/"&gt;SUSE&#xD;
Linux Enterprise 11 High-Availability Extension&lt;/a&gt; too, of&#xD;
course.&#xD;
&lt;p&gt;For the full cluster functionality with&#xD;
OpenAIS/OCFS2/cLVM2 and an OCFS2 mount on top, you need to&#xD;
configure DLM, O2CB, cLVM2 clones, one to start the LVM2&#xD;
volume group, and Filesystem resources to mount the file&#xD;
system. Add in all the dependencies needed, and you end up&#xD;
with a configuration pretty much like this (shown in CRM&#xD;
shell syntax, which is already much more concise than the&#xD;
raw XML):&lt;br&gt;&#xD;
&lt;pre&gt;&#xD;
primitive clvm ocf:lvm2:clvmd&#xD;
primitive dlm ocf:pacemaker:controld&#xD;
primitive o2cb ocf:ocfs2:o2cb&#xD;
primitive ocfs2-2 ocf:heartbeat:Filesystem \&#xD;
        params device="/dev/cluster-vg/ocfs2"&#xD;
directory="/ocfs2-2" fstype="ocfs2"&#xD;
primitive vg1 ocf:heartbeat:LVM \&#xD;
        params volgrpname="cluster-vg"&#xD;
clone c-ocfs2-2 ocfs2-2 \&#xD;
        meta target-role="Started" interleave="true"&#xD;
clone clvm-clone clvm \&#xD;
        meta target-role="Started" interleave="true"&#xD;
ordered="true"&#xD;
clone dlm-clone dlm \&#xD;
        meta interleave="true" ordered="true"&#xD;
target-role="Stopped"&#xD;
clone o2cb-clone o2cb \&#xD;
        meta target-role="Started" interleave="true"&#xD;
ordered="true"&#xD;
clone vg1-clone vg1 \&#xD;
        meta target-role="Started" interleave="true"&#xD;
ordered="true"&#xD;
colocation colo-clvm inf: clvm-clone dlm-clone&#xD;
colocation colo-o2cb inf: o2cb-clone dlm-clone&#xD;
colocation colo-ocfs2-2 inf: c-ocfs2-2 o2cb-clone&#xD;
colocation colo-ocfs2-2-vg1 inf: c-ocfs2-2 vg1-clone&#xD;
colocation colo-vg1 inf: vg1-clone clvm-clone&#xD;
order order-clvm inf: dlm-clone clvm-clone&#xD;
order order-o2cb inf: dlm-clone o2cb-clone&#xD;
order order-ocfs2-2 inf: o2cb-clone c-ocfs2-2&#xD;
order order-ocfs2-2-vg1 inf: vg1-clone c-ocfs2-2&#xD;
order order-vg1 inf: clvm-clone vg1-clone&#xD;
&lt;/pre&gt;&#xD;
That's quite a bite, and becomes cumbersome for every fs you&#xD;
add.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;However, there is a little known feature - you can&#xD;
actually clone a resource group:&lt;br&gt;&#xD;
&lt;pre&gt;&#xD;
primitive clvm ocf:lvm2:clvmd&#xD;
primitive dlm ocf:pacemaker:controld&#xD;
primitive o2cb ocf:ocfs2:o2cb&#xD;
primitive ocfs2-2 ocf:heartbeat:Filesystem \&#xD;
        params device="/dev/cluster-vg/ocfs2"&#xD;
directory="/ocfs2-2" fstype="ocfs2"&#xD;
primitive vg1 ocf:heartbeat:LVM \&#xD;
        params volgrpname="cluster-vg"&#xD;
group base-group dlm o2cb clvm vg1 ocfs2-2&#xD;
clone base-clone base-group \&#xD;
	meta interleave="true"&#xD;
&lt;/pre&gt;&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;I think this speaks for itself; &lt;i&gt;20 lines of&#xD;
configuration reduced&lt;/i&gt;. You will also find that&#xD;
&lt;code&gt;crm_mon&lt;/code&gt; output is much simpler and shorter,&#xD;
allowing&#xD;
you to&#xD;
see more of the cluster status in one go.&#xD;
</description>
    </item>
    <item>
      <pubDate>Thu, 20 Aug 2009 08:21:59 GMT</pubDate>
      <title>20 Aug 2009</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=103</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=103</guid>
      <description>&lt;p&gt;Today I'd like to briefly introduce a new safety feature&#xD;
in Pacemaker.&#xD;
&lt;p&gt;Many times, we have seen customers and users complain&#xD;
that they thought they had correctly setup their cluster,&#xD;
but then resources were not started elsewhere when they&#xD;
killed one of the nodes. With OCFS2 or clvmd, they would&#xD;
even see access to the filesystem on the surviving nodes&#xD;
blocking and processes, including kernel threads, end up in&#xD;
the dreaded "D" state! Surely this must be a bug in the&#xD;
cluster software.&#xD;
&lt;p&gt;Usually, it turns out that these scenarios escalated&#xD;
fairly quickly, because usually customers test recovery&#xD;
scenarios only fairly closely to before they want to deploy,&#xD;
or find out after they have deployed to production already.&#xD;
Not a good time for clear thinking.&#xD;
&lt;p&gt;However, most of these scenarios have a common&#xD;
misconfiguration: no fencing defined. Now, fencing is&#xD;
essential to data integrity, in particular with OCFS2, so&#xD;
the cluster refuses to proceed until fencing has completed;&#xD;
the blocking behaviour is actually correct. The system would&#xD;
warn about this at "ERROR" priority in several places.&#xD;
&lt;p&gt;Yet it became clear that something needed to be done;&#xD;
people do not like to read their logfiles, it seems.&#xD;
Inspired by a report by Jo de Baer, I thought it would be&#xD;
more convenient if the resources did not even start in the&#xD;
first place if such a gross misconfiguration was detected,&#xD;
and &lt;A HREF="http://theclusterguy.clusterlabs.org/"&gt;Andrew&lt;/a&gt;&#xD;
agreed.&#xD;
&lt;p&gt;The resulting &lt;A HREF="http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/849aa0d0696d"&gt;patch&lt;/a&gt;&#xD;
is very short, but effective. Such misconfigurations now&#xD;
fail early, without causing the impression that the cluster&#xD;
might actually be working.&#xD;
&lt;p&gt;This does certainly not prevent all errors; it can't&#xD;
directly detect whether fencing is configured properly and&#xD;
actually works, which is too much for a poor policy engine&#xD;
to decide. But we can try to protect some administrators&#xD;
from themselves.&#xD;
&lt;p&gt;(As time progresses, we will perhaps add more such low&#xD;
hanging fruits to make the cluster "more obvious" to&#xD;
configure. But still, I would hope that going forward, more&#xD;
administrators would at least try to read and understand the&#xD;
logs - as you can see from the patch, the message was&#xD;
already very clear before, and "ERROR:" messages definitely&#xD;
should&#xD;
catch any administrators attention.)&#xD;
</description>
    </item>
    <item>
      <pubDate>Mon, 11 May 2009 20:49:16 GMT</pubDate>
      <title>11 May 2009</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=102</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=102</guid>
      <description>&lt;p&gt;It is with the greatest pleasure that I am able to&#xD;
announce that Novell has just posted the &lt;A HREF="http://www.novell.com/documentation/sles11/book_sleha/data/book_sleha.html"&gt;documentation&#xD;
for setting up OpenAIS, Pacemaker, OCFS2, cLVM2, DRBD, based&#xD;
on SUSE Linux Enterprise High-Availability 11&lt;/a&gt; - but&#xD;
equally applicable to other users of this software stack.&#xD;
&lt;p&gt;We understand it is a work in progress, and the uptodate&#xD;
docbook sources will be made available under the LGPL too in&#xD;
the very near future in a mercurial repositoy, and we hope&#xD;
to turn this into a community project as well, providing the&#xD;
most complete documentation coverage for clustering on Linux&#xD;
one day!</description>
    </item>
    <item>
      <pubDate>Sat, 21 Mar 2009 19:38:02 GMT</pubDate>
      <title>21 Mar 2009</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=101</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=101</guid>
      <description>&lt;ul&gt;&#xD;
&lt;li&gt;So our new test cluster environment is a 16 node HP&#xD;
blade center, which pleases me quite a bit. The blades all&#xD;
have a hardware watchdog card, which of course makes perfect&#xD;
sense for a cluster to use.&#xD;
&lt;li&gt;However, the attempt to set the timeout to 5s was&#xD;
thwarted by the kernel message &lt;blockquote&gt;hpwdt: New value&#xD;
passed in is invalid: 5 seconds.&lt;/blockquote&gt;&#xD;
&lt;li&gt;So in I dived into hpwdt.c, to find:&lt;br&gt;&#xD;
&lt;code&gt;&#xD;
static int hpwdt_change_timer(int new_margin)&lt;br&gt;&#xD;
{&lt;br&gt;&#xD;
        /* Arbitrary, can't find the card's limits */&lt;br&gt;&#xD;
        if (new_margin &amp;lt; 30 || new_margin &amp;gt; 600) {&lt;br&gt;&#xD;
                printk(KERN_WARNING&#xD;
                        "hpwdt: New value passed in is&#xD;
invalid: %d seconds.\n",                        new_margin);&lt;br&gt;&#xD;
                return -EINVAL;&lt;br&gt;&#xD;
        }&lt;br&gt;&#xD;
&lt;/code&gt;&#xD;
&lt;li&gt;Okay, that can happen. Sometimes driver writes have to&#xD;
make guesses when the vendor is not cooperative or&#xD;
unavailable. So who wrote the driver? &lt;br&gt;&#xD;
&lt;code&gt;&#xD;
 *      (c) Copyright 2007 Hewlett-Packard Development&#xD;
Company, L.P.&#xD;
&lt;/code&gt;&#xD;
&lt;li&gt;...&#xD;
&lt;/ul&gt;</description>
    </item>
  </channel>
</rss>
