<?xml version="1.0"?>
<rss version="2.0">
  <channel>
    <title>Advogato blog for lmb</title>
    <link>http://www.advogato.org/person/lmb/</link>
    <description>Advogato blog for lmb</description>
    <language>en-us</language>
    <generator>mod_virgule</generator>
    <pubDate>Sun, 21 Mar 2010 14:29:58 GMT</pubDate>
    <item>
      <pubDate>Thu, 29 Oct 2009 11:02:56 GMT</pubDate>
      <title>29 Oct 2009</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=104</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=104</guid>
      <description>&lt;p&gt;Again a tip on how to write your OpenAIS/Pacemaker&#xD;
configuration in a simpler fashion; this applies to &lt;A HREF="http://www.novell.com/products/highavailability/"&gt;SUSE&#xD;
Linux Enterprise 11 High-Availability Extension&lt;/a&gt; too, of&#xD;
course.&#xD;
&lt;p&gt;For the full cluster functionality with&#xD;
OpenAIS/OCFS2/cLVM2 and an OCFS2 mount on top, you need to&#xD;
configure DLM, O2CB, cLVM2 clones, one to start the LVM2&#xD;
volume group, and Filesystem resources to mount the file&#xD;
system. Add in all the dependencies needed, and you end up&#xD;
with a configuration pretty much like this (shown in CRM&#xD;
shell syntax, which is already much more concise than the&#xD;
raw XML):&lt;br&gt;&#xD;
&lt;pre&gt;&#xD;
primitive clvm ocf:lvm2:clvmd&#xD;
primitive dlm ocf:pacemaker:controld&#xD;
primitive o2cb ocf:ocfs2:o2cb&#xD;
primitive ocfs2-2 ocf:heartbeat:Filesystem \&#xD;
        params device="/dev/cluster-vg/ocfs2"&#xD;
directory="/ocfs2-2" fstype="ocfs2"&#xD;
primitive vg1 ocf:heartbeat:LVM \&#xD;
        params volgrpname="cluster-vg"&#xD;
clone c-ocfs2-2 ocfs2-2 \&#xD;
        meta target-role="Started" interleave="true"&#xD;
clone clvm-clone clvm \&#xD;
        meta target-role="Started" interleave="true"&#xD;
ordered="true"&#xD;
clone dlm-clone dlm \&#xD;
        meta interleave="true" ordered="true"&#xD;
target-role="Stopped"&#xD;
clone o2cb-clone o2cb \&#xD;
        meta target-role="Started" interleave="true"&#xD;
ordered="true"&#xD;
clone vg1-clone vg1 \&#xD;
        meta target-role="Started" interleave="true"&#xD;
ordered="true"&#xD;
colocation colo-clvm inf: clvm-clone dlm-clone&#xD;
colocation colo-o2cb inf: o2cb-clone dlm-clone&#xD;
colocation colo-ocfs2-2 inf: c-ocfs2-2 o2cb-clone&#xD;
colocation colo-ocfs2-2-vg1 inf: c-ocfs2-2 vg1-clone&#xD;
colocation colo-vg1 inf: vg1-clone clvm-clone&#xD;
order order-clvm inf: dlm-clone clvm-clone&#xD;
order order-o2cb inf: dlm-clone o2cb-clone&#xD;
order order-ocfs2-2 inf: o2cb-clone c-ocfs2-2&#xD;
order order-ocfs2-2-vg1 inf: vg1-clone c-ocfs2-2&#xD;
order order-vg1 inf: clvm-clone vg1-clone&#xD;
&lt;/pre&gt;&#xD;
That's quite a bite, and becomes cumbersome for every fs you&#xD;
add.&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;However, there is a little known feature - you can&#xD;
actually clone a resource group:&lt;br&gt;&#xD;
&lt;pre&gt;&#xD;
primitive clvm ocf:lvm2:clvmd&#xD;
primitive dlm ocf:pacemaker:controld&#xD;
primitive o2cb ocf:ocfs2:o2cb&#xD;
primitive ocfs2-2 ocf:heartbeat:Filesystem \&#xD;
        params device="/dev/cluster-vg/ocfs2"&#xD;
directory="/ocfs2-2" fstype="ocfs2"&#xD;
primitive vg1 ocf:heartbeat:LVM \&#xD;
        params volgrpname="cluster-vg"&#xD;
group base-group dlm o2cb clvm vg1 ocfs2-2&#xD;
clone base-clone base-group \&#xD;
	meta interleave="true"&#xD;
&lt;/pre&gt;&#xD;
&#xD;
&lt;p&gt; &lt;p&gt;I think this speaks for itself; &lt;i&gt;20 lines of&#xD;
configuration reduced&lt;/i&gt;. You will also find that&#xD;
&lt;code&gt;crm_mon&lt;/code&gt; output is much simpler and shorter,&#xD;
allowing&#xD;
you to&#xD;
see more of the cluster status in one go.&#xD;
</description>
    </item>
    <item>
      <pubDate>Thu, 20 Aug 2009 08:21:59 GMT</pubDate>
      <title>20 Aug 2009</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=103</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=103</guid>
      <description>&lt;p&gt;Today I'd like to briefly introduce a new safety feature&#xD;
in Pacemaker.&#xD;
&lt;p&gt;Many times, we have seen customers and users complain&#xD;
that they thought they had correctly setup their cluster,&#xD;
but then resources were not started elsewhere when they&#xD;
killed one of the nodes. With OCFS2 or clvmd, they would&#xD;
even see access to the filesystem on the surviving nodes&#xD;
blocking and processes, including kernel threads, end up in&#xD;
the dreaded "D" state! Surely this must be a bug in the&#xD;
cluster software.&#xD;
&lt;p&gt;Usually, it turns out that these scenarios escalated&#xD;
fairly quickly, because usually customers test recovery&#xD;
scenarios only fairly closely to before they want to deploy,&#xD;
or find out after they have deployed to production already.&#xD;
Not a good time for clear thinking.&#xD;
&lt;p&gt;However, most of these scenarios have a common&#xD;
misconfiguration: no fencing defined. Now, fencing is&#xD;
essential to data integrity, in particular with OCFS2, so&#xD;
the cluster refuses to proceed until fencing has completed;&#xD;
the blocking behaviour is actually correct. The system would&#xD;
warn about this at "ERROR" priority in several places.&#xD;
&lt;p&gt;Yet it became clear that something needed to be done;&#xD;
people do not like to read their logfiles, it seems.&#xD;
Inspired by a report by Jo de Baer, I thought it would be&#xD;
more convenient if the resources did not even start in the&#xD;
first place if such a gross misconfiguration was detected,&#xD;
and &lt;A HREF="http://theclusterguy.clusterlabs.org/"&gt;Andrew&lt;/a&gt;&#xD;
agreed.&#xD;
&lt;p&gt;The resulting &lt;A HREF="http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/849aa0d0696d"&gt;patch&lt;/a&gt;&#xD;
is very short, but effective. Such misconfigurations now&#xD;
fail early, without causing the impression that the cluster&#xD;
might actually be working.&#xD;
&lt;p&gt;This does certainly not prevent all errors; it can't&#xD;
directly detect whether fencing is configured properly and&#xD;
actually works, which is too much for a poor policy engine&#xD;
to decide. But we can try to protect some administrators&#xD;
from themselves.&#xD;
&lt;p&gt;(As time progresses, we will perhaps add more such low&#xD;
hanging fruits to make the cluster "more obvious" to&#xD;
configure. But still, I would hope that going forward, more&#xD;
administrators would at least try to read and understand the&#xD;
logs - as you can see from the patch, the message was&#xD;
already very clear before, and "ERROR:" messages definitely&#xD;
should&#xD;
catch any administrators attention.)&#xD;
</description>
    </item>
    <item>
      <pubDate>Mon, 11 May 2009 20:49:16 GMT</pubDate>
      <title>11 May 2009</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=102</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=102</guid>
      <description>&lt;p&gt;It is with the greatest pleasure that I am able to&#xD;
announce that Novell has just posted the &lt;A HREF="http://www.novell.com/documentation/sles11/book_sleha/data/book_sleha.html"&gt;documentation&#xD;
for setting up OpenAIS, Pacemaker, OCFS2, cLVM2, DRBD, based&#xD;
on SUSE Linux Enterprise High-Availability 11&lt;/a&gt; - but&#xD;
equally applicable to other users of this software stack.&#xD;
&lt;p&gt;We understand it is a work in progress, and the uptodate&#xD;
docbook sources will be made available under the LGPL too in&#xD;
the very near future in a mercurial repositoy, and we hope&#xD;
to turn this into a community project as well, providing the&#xD;
most complete documentation coverage for clustering on Linux&#xD;
one day!</description>
    </item>
    <item>
      <pubDate>Sat, 21 Mar 2009 19:38:02 GMT</pubDate>
      <title>21 Mar 2009</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=101</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=101</guid>
      <description>&lt;ul&gt;&#xD;
&lt;li&gt;So our new test cluster environment is a 16 node HP&#xD;
blade center, which pleases me quite a bit. The blades all&#xD;
have a hardware watchdog card, which of course makes perfect&#xD;
sense for a cluster to use.&#xD;
&lt;li&gt;However, the attempt to set the timeout to 5s was&#xD;
thwarted by the kernel message &lt;blockquote&gt;hpwdt: New value&#xD;
passed in is invalid: 5 seconds.&lt;/blockquote&gt;&#xD;
&lt;li&gt;So in I dived into hpwdt.c, to find:&lt;br&gt;&#xD;
&lt;code&gt;&#xD;
static int hpwdt_change_timer(int new_margin)&lt;br&gt;&#xD;
{&lt;br&gt;&#xD;
        /* Arbitrary, can't find the card's limits */&lt;br&gt;&#xD;
        if (new_margin &amp;lt; 30 || new_margin &amp;gt; 600) {&lt;br&gt;&#xD;
                printk(KERN_WARNING&#xD;
                        "hpwdt: New value passed in is&#xD;
invalid: %d seconds.\n",                        new_margin);&lt;br&gt;&#xD;
                return -EINVAL;&lt;br&gt;&#xD;
        }&lt;br&gt;&#xD;
&lt;/code&gt;&#xD;
&lt;li&gt;Okay, that can happen. Sometimes driver writes have to&#xD;
make guesses when the vendor is not cooperative or&#xD;
unavailable. So who wrote the driver? &lt;br&gt;&#xD;
&lt;code&gt;&#xD;
 *      (c) Copyright 2007 Hewlett-Packard Development&#xD;
Company, L.P.&#xD;
&lt;/code&gt;&#xD;
&lt;li&gt;...&#xD;
&lt;/ul&gt;</description>
    </item>
    <item>
      <pubDate>Thu, 25 Dec 2008 21:42:44 GMT</pubDate>
      <title>25 Dec 2008</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=100</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=100</guid>
      <description>&lt;p&gt;I prefer to ignore christmas and the madness they call&#xD;
&lt;i&gt;holidays&lt;/i&gt;, but would like to close the year with a&#xD;
series of three questions, starting today:&#xD;
&lt;ol&gt;&#xD;
&lt;p&gt;&lt;li&gt;What can Open Source (and/or Linux) contribute to&#xD;
making the world a better place? Think of developing nations&#xD;
and the real large issues, as well as the slightly smaller ones.&#xD;
&lt;/ol&gt;&#xD;
&lt;p&gt;Please feel free to e-mail me your answers to lmb at suse&#xD;
dot de, but this is not required to follow this experiment.</description>
    </item>
    <item>
      <pubDate>Wed, 15 Oct 2008 13:10:01 GMT</pubDate>
      <title>15 Oct 2008</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=99</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=99</guid>
      <description>&lt;ul&gt;&#xD;
&lt;li&gt;&lt;A HREF="http://www.heise.de/open/Linux-Kongress-2008--/artikel/117386"&gt;An&#xD;
article by heise open&lt;/a&gt; covers the Linux Kongress, and&#xD;
also my presentation on convergence of cluster stacks, even&#xD;
though they represent my message it slightly more tentative&#xD;
than I intended it to be. But maybe I am too optimistic. For&#xD;
what it is worth, &lt;A HREF="http://www.heise.de/open/Linux-Kongress-2008--/zoom/117386/1"&gt;here&#xD;
is a picture of the slide&lt;/a&gt; where I outlined the&#xD;
components in the joint stack, which heise open calls a&#xD;
"good mix from all sources."&#xD;
&lt;li&gt;It is possibly quite important that that is &lt;i&gt;my&lt;/i&gt;&#xD;
understanding of the results and goals, and even though I&#xD;
believe we had good buy-in in the development community,&#xD;
this should &lt;b&gt;not&lt;/b&gt; be understood as a promise or&#xD;
commitment (or&#xD;
lack thereof) by Red Hat or Novell or anyone else to deliver&#xD;
this in the Enterprise distributions in particular, nor that&#xD;
there will be any loss of support for current&#xD;
configurations. If I could speak for both Red Hat and&#xD;
Novell, I would be earning a hell of a lot more money. (Some&#xD;
initial feedback to my blog entry here made me add this&#xD;
paragraph; I did discuss this in the presentation, but it is&#xD;
not captured on the slide shown.)&#xD;
&lt;/ul&gt;</description>
    </item>
    <item>
      <pubDate>Tue, 14 Oct 2008 12:24:30 GMT</pubDate>
      <title>14 Oct 2008</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=98</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=98</guid>
      <description>&lt;ul&gt;&#xD;
&lt;li&gt;Lukas Chaplin of Linux-Lancers.com, a Linux recruiting&#xD;
and placement agency, has &lt;A HREF="http://blog.linux-lancers.com/2008/10/14/interview-with-lmb-about-homeoffice/"&gt;interviewed&#xD;
me about working from a home office.&lt;/a&gt; This is not yet as&#xD;
pervasive elsewhere as in the Open Source environment, which&#xD;
is really a shame.&#xD;
&lt;li&gt;Of course, before going to Lukas you should first check&#xD;
whether Novell &amp;amp; SuSE can offer you a new challenge!&#xD;
&lt;li&gt;&lt;p&gt;It's been a while since I blogged, so I have two&#xD;
conference reports as well, starting with the &lt;A HREF="http://www.gossamer-threads.com/lists/linuxha/pacemaker/51414"&gt;Cluster&#xD;
Developer Summit in Prague, 2008-09-28 - 2008-10-02.&lt;/a&gt;&#xD;
(See the link for Fabio's report.)&#xD;
&lt;p&gt;This Summit was organized by Fabio from Red Hat and&#xD;
hosted by Novell, with attendees from Oracle, Atix, NTT&#xD;
Japan and others, which Lon &lt;A HREF="http://picasaweb.google.com/lhohberger/ClusterSummit2008#5252112876857112866"&gt;captured&#xD;
on this picture.&lt;/a&gt; It is my honest belief that within a&#xD;
year or two, we shall have one single cluster stack on&#xD;
Linux; totally awesome! Amazing how much progress one can&#xD;
make if one is not stuck to one's own old code, but willing&#xD;
to select the best-of-breed.&#xD;
&lt;p&gt;I think we have come a long way in the last ten years;&#xD;
having explored several different paths through concurrent&#xD;
evolution, we are now seeing more and more convergence as&#xD;
there is less and less justification for the redundant&#xD;
effort expended. Dogs, cats, and mice eating together ... It&#xD;
also reinforced my opinion that small, focused developer&#xD;
events can be exceptionally productive.&#xD;
&lt;li&gt;&lt;p&gt;At &lt;A HREF="http://www.linux-kongress.org/2008/program.html"&gt;Linux&#xD;
Kongress 2008 in beautiful Hamburg&lt;/a&gt;, there were many&#xD;
tutorials and sessions where Pacemaker + heartbeat were used&#xD;
to build high-availability clusters. In my own session, I&#xD;
presented the last year or so of development on Pacemaker&#xD;
and heartbeat, and of course summarized the results from the&#xD;
Cluster Developer Summit.&#xD;
&lt;p&gt;I also learned about a neat trick Samba's CTDB plays with&#xD;
TCP to make fail-over faster; of course, thanks to this&#xD;
being Open Source, they were able to contribute it to the&#xD;
community instead of reinventing their own cluster stack.&#xD;
(Haha, just kidding, of course they rolled their own - this&#xD;
&lt;i&gt;is&lt;/i&gt; Open Source after all.) However, it should be&#xD;
possible to copy it and integrate it as a generic function&#xD;
for IP address fail-over. Cool stuff.&#xD;
&lt;p&gt;I also very much enjoyed dinner with James, Jonathan,&#xD;
Andreas, Lars (Ellenberg), and Kay - who lives in Hamburg,&#xD;
but whom I only see at conferences ... Refer to the working&#xD;
from home offices interview!&#xD;
&lt;/ul&gt;</description>
    </item>
    <item>
      <pubDate>Mon, 15 Sep 2008 21:24:49 GMT</pubDate>
      <title>15 Sep 2008</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=97</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=97</guid>
      <description>&lt;ul&gt;&#xD;
&lt;li&gt;Miguel: you can use &lt;code&gt;getsockopt(sockfd, SOL_SOCKET,&#xD;
SO_PEERCRED, cred, &amp;amp;n)&lt;/code&gt; to find out the farside&#xD;
pid and uid from within the server.&#xD;
&lt;/ul&gt;</description>
    </item>
    <item>
      <pubDate>Sat, 23 Aug 2008 22:19:27 GMT</pubDate>
      <title>23 Aug 2008</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=96</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=96</guid>
      <description>&lt;ul&gt;&#xD;
&lt;li&gt;&lt;p&gt;Hi all, long time no blog. But with the recent&#xD;
announcement of the &lt;A HREF="http://www.linux-kongress.org/2008/program.html"&gt;Linux&#xD;
Kongress 2008 program&lt;/a&gt;, which will happen in my chosen&#xD;
home city Hamburg from 7th to 10th October, I have to share&#xD;
the joy:&#xD;
&lt;p&gt;Not &lt;i&gt;one&lt;/i&gt;, but &lt;b&gt;three&lt;/b&gt; tutorials - both in&#xD;
English and German - explaining how to use Linux-HA with the&#xD;
CRM/Pacemaker as an high-availability cluster environment.&#xD;
&lt;p&gt;Congratulations and thanks to Ralph Dehner, Lars&#xD;
Ellenberg, Joerg Jungermann, Maximilian Wilhelm!&#xD;
&lt;p&gt;Also, a brief talk by myself on the future of HA on&#xD;
Linux, fresh from the Cluster Developer Summit in Prag.&#xD;
&lt;p&gt;All in all, Linux Kongress has a very, very strong&#xD;
program this year, and I look forward to meeting you all in&#xD;
Hamburg - bring your umbrella!&#xD;
&lt;li&gt;&lt;p&gt;On Monday, &lt;A HREF="http://idea.opensuse.org/"&gt;Hack Week&#xD;
2008&lt;/a&gt; begins. I will be working on &lt;A href="http://idea.opensuse.org/content/ideas/linux-haheartbeat-shared-storage-stonith-sbd"&gt;shared&#xD;
storage-based fencing&lt;/a&gt; for heartbeat, and possibly some&#xD;
others projects relating to clustering.&#xD;
&lt;p&gt;I also look forward in particular to the &lt;i&gt;First&#xD;
Penguin Award&lt;/i&gt; candidates: the price for the most daring&#xD;
failure. Failure is crucial to success; learning where the&#xD;
boundaries of our models and theories are is the foundation&#xD;
of science, and successful design. Only by anticipating and&#xD;
overcoming failure is success possible. If you doubt this&#xD;
for a single moment, read &lt;A HREF="http://press.princeton.edu/titles/8132.html"&gt;Petroski:&#xD;
Success through failure.&lt;/a&gt;&#xD;
&lt;p&gt;As a member of the panel and obsessed with things going&#xD;
wrong, I hope your project contributes to our knowledge; and&#xD;
the most valuable lesson of the whole week just could be&#xD;
learned from showing what does not work. And, there will be&#xD;
a price too! How good is that?&#xD;
&lt;/ul&gt;</description>
    </item>
    <item>
      <pubDate>Sat, 24 May 2008 14:21:56 GMT</pubDate>
      <title>24 May 2008</title>
      <link>http://www.advogato.org/person/lmb/diary.html?start=95</link>
      <guid>http://www.advogato.org/person/lmb/diary.html?start=95</guid>
      <description>&lt;P&gt;Jozef has posted &lt;A HREF="http://www.novell.com/communities/node/4846/load-balancing-howto-lvs-ldirectord-heartbeat-2"&gt;a&#xD;
very cool solutions article&lt;/a&gt; describing how to build a&#xD;
highly-available load-balancing solution for any TCP-based&#xD;
network service (including mail, web, ftp, etcetera) using&#xD;
entirely Open Source components and of course all included&#xD;
with SUSE Linux Enterprise Server 10 SP2 - Linux-HA, Linux&#xD;
Virtual Server, and ldirectord. Rock on!&#xD;
&lt;p&gt;Of course, you &lt;i&gt;could&lt;/i&gt; buy an expensive appliance&#xD;
instead ...&#xD;
</description>
    </item>
  </channel>
</rss>
