Older blog entries for lmb (starting at number 96)

23 Aug 2008 (updated 23 Aug 2008 at 22:20 UTC) »
  • Hi all, long time no blog. But with the recent announcement of the Linux Kongress 2008 program, which will happen in my chosen home city Hamburg from 7th to 10th October, I have to share the joy:

    Not one, but three tutorials - both in English and German - explaining how to use Linux-HA with the CRM/Pacemaker as an high-availability cluster environment.

    Congratulations and thanks to Ralph Dehner, Lars Ellenberg, Joerg Jungermann, Maximilian Wilhelm!

    Also, a brief talk by myself on the future of HA on Linux, fresh from the Cluster Developer Summit in Prag.

    All in all, Linux Kongress has a very, very strong program this year, and I look forward to meeting you all in Hamburg - bring your umbrella!

  • On Monday, Hack Week 2008 begins. I will be working on shared storage-based fencing for heartbeat, and possibly some others projects relating to clustering.

    I also look forward in particular to the First Penguin Award candidates: the price for the most daring failure. Failure is crucial to success; learning where the boundaries of our models and theories are is the foundation of science, and successful design. Only by anticipating and overcoming failure is success possible. If you doubt this for a single moment, read Petroski: Success through failure.

    As a member of the panel and obsessed with things going wrong, I hope your project contributes to our knowledge; and the most valuable lesson of the whole week just could be learned from showing what does not work. And, there will be a price too! How good is that?

Jozef has posted a very cool solutions article describing how to build a highly-available load-balancing solution for any TCP-based network service (including mail, web, ftp, etcetera) using entirely Open Source components and of course all included with SUSE Linux Enterprise Server 10 SP2 - Linux-HA, Linux Virtual Server, and ldirectord. Rock on!

Of course, you could buy an expensive appliance instead ...

Bad Syabas. They manufacture the Popcorn Network Media Tank, and despite clearly running a Linux variant, no source code nor written offer to supply it. Kindly e-mailed their support to rectify the situation ASAP.

In this post, Alan Robertson discusses cluster stacks. This is interesting, but has some misleading points:

  • Linux-HA (with or without OpenAIS) supports the AIS membership APIs. This is not quite correct, in as far as the support of the APIs provided is close to ancient, and - worse - that membership by itself is rather pointless; Linux-HA as-is does not provide the messaging or any other of the APIs for AIS, so the membership itself does not mean that any AIS application could run.

  • Nevertheless, in an ideal world, all cluster components and cluster-aware applications would sit on top of the same set of communications protocols. Let's just keep this one in mind, we're going to need it below!

  • The Linux-HA CRM function is largely divided between the PE and TE – which are described below. The CRM has been split out from the Linux-HA heartbeat project by its developers; I'm not sure how Alan failed to mention this, as he has been objecting to it for the last few weeks ;-)

    Technically, the description is not quite right either. The CRM itself is a fairly important component, electing the transition coordinator, dealing with failed nodes and implementing the state transitions at the cluster level. Its components not only include the Policy Engine or the Transitioner, but the CIB itself also is part of the CRM modules.

  • It's interesting how the PE receives the largest share of criticism, while no comments are made about the scalability and performance of the messaging layer itself. Oh well. The PE actually is modularized and completes its task in several stages - the original design called for placement first and ordering later, as distinct steps -, but the modules have a high inter-dependency, and in practice, it turned out not to be so easy; clear and robust interfaces are very hard to define. For a similar problem, look at how gcc "modularizes" its optimization steps.

    While the PE does perform round-robin load-balancing, full resource cost and load balancing attempts turn the problem into an exceptionally hard one; we considered this, and then postponed it until later. For now, our main goal is to keep services alive, and leave the load balancing to some external component which modifies our node weights; seems fairly modular to me, in fact.

    It's true that we might step towards modularization (again!) as we understand the problem more and more, but I object to the underlaying assumption that we hadn't thought of all that before.

  • The LRM proxy communicates between the CRM and the LRMs on all the various machines. This function is currently built into the CRM. This architectural decision was based on expedience more than anything else. I wonder how else the CRM's TE is supposed to communicate with the LRMs, as needed to carry out the commands and retrieve status, if not by having some form of proxy/interface to them?

  • To support larger clusters this needs to be separated out, made more scalable, and more flexible. This would allow a large number of LRMs to be supported by a small number of LRM proxies. The CRM and its components (TE, CIB) clearly requires an interface to the LRMs, so I'm not quite sure how this could be separated out.

    My guess would be that he is refering to the idea of having the CRM manage nodes (virtual or physical) which are not full cluster members as containers for resources. And, supposedly, not suggesting to treat them as virtual cluster members at the membership level any longer! Nice to see he's dropped that idea. Yet, as Alan likes to give credit when he came up with something, maybe he should give credit for this as well ...? Just thinking.

  • In large systems, this would probably use the ClusterIP capability to provide load distribution (leveling) across multiple LRM proxies. I have absolutely no idea what this is supposed to suggest.

  • The description of the quorum daemon might imply the suggestion that Linux-HA supported general split-site clusters right now. As much as I wish it did, this is not true.

    And while quorum in two-node clusters is indeed problematic (because they always have a tie on one node down), the quorum server most certainly is not needed for two node clusters, as fencing resolves this problem nicely, and has done so for years.

  • For a variety of reasons, kernel space doesn't have access to user-space cluster communications or membership.
    As a result, both the DLM and most cluster filesytems implements their own membership and communications.
    This is technically incorrect; OCFS2 has been instrumented to inherit the membership from user-space, as has GFS2. (Or, in fact, their DLMs inherit this.)

    The discussion of case 1 neglects the detail that the "other" membership also must be told to not talk to the other node, same as case 2; in fact, each membership must be reduced to the common subset. The method described for case 2 indeed is not pretty, and would not work right now (as the mechanisms do not exist), as claimed:

  • Although Case 2 isn't pretty, it works, and no amount of wishing and hoping is likely to ever make this kind of problem go away in the general case.

    This is quite certainly the most confusing message in this lecture. First, it is wrong today, even for Linux-HA: OCFS2 avoids this by inherting the Linux-HA membership through the Filesystem resource agent.

    Second, by porting the CRM modules - now called PaceMaker - to run natively on top of openAIS, just as C-LVM2, GFS2, and OCFS2 will, we are finally on the track to solve this perfectly and having everyone use the same membership.

    However, it should be noted that there has been exactly one person unhappy about this, who is now trying to sell it as if it was his idea, and not that he opposes it still - I wonder, who might that be?

I will further admit that it irks and offends me that Alan talks of the CRM as our work (as if he had been involved much in it), and explicitly mentions how he started the OCF in 2001, mentions IBM and Red Hat, yet completely fails to mention the contributions made by many Novell and SUSE engineers, most notably by Andrew Beekhof. Oh well.

  • Andrew and myself have decided to split out the CRM from the Linux HA project, for a number of reasons. Not forking, but split; we do not intend to duplicate any development, and end-users would not notice the difference.

  • We would love to discuss our reasons, but Alan has immediately disabled our posting to the Linux HA mailing lists, after we had announced our intentions, and asked if someone wants to pick up maintaining the CRM within Heartbeat.

  • I am afraid this just points out one of the reasons why this might be considered needed.

  • We'd gladly discuss this further, but alas ...

  • Check this out: Heartbeat's Cluster Resource Manager ported to openAIS

  • Ladies and gentlemen, this means nothing less than finally and clearly moving towards converging the major Open Source clustering stacks on Linux. Wow. That I may live to see that!

  • This great news for the community and the Enterprise vendors as well.

  • Monday. Some kernel care taking, some HR stuff.
  • Project leads refusing to find out about the SCM they are using. Sigh.

87 older entries...

New Advogato Features

New HTML Parser: The long-awaited libxml2 based HTML parser code is live. It needs further work but already handles most markup better than the original parser.

Keep up with the latest Advogato features by reading the Advogato status blog.

If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!