Jozef has posted a very cool solutions article describing how to build a highly-available load-balancing solution for any TCP-based network service (including mail, web, ftp, etcetera) using entirely Open Source components and of course all included with SUSE Linux Enterprise Server 10 SP2 - Linux-HA, Linux Virtual Server, and ldirectord. Rock on!
Of course, you could buy an expensive appliance instead ...
Technically, the description is not quite right either. The CRM itself is a fairly important component, electing the transition coordinator, dealing with failed nodes and implementing the state transitions at the cluster level. Its components not only include the Policy Engine or the Transitioner, but the CIB itself also is part of the CRM modules.
While the PE does perform round-robin load-balancing, full resource cost and load balancing attempts turn the problem into an exceptionally hard one; we considered this, and then postponed it until later. For now, our main goal is to keep services alive, and leave the load balancing to some external component which modifies our node weights; seems fairly modular to me, in fact.
It's true that we might step towards modularization (again!) as we understand the problem more and more, but I object to the underlaying assumption that we hadn't thought of all that before.
My guess would be that he is refering to the idea of having the CRM manage nodes (virtual or physical) which are not full cluster members as containers for resources. And, supposedly, not suggesting to treat them as virtual cluster members at the membership level any longer! Nice to see he's dropped that idea. Yet, as Alan likes to give credit when he came up with something, maybe he should give credit for this as well ...? Just thinking.
And while quorum in two-node clusters is indeed problematic (because they always have a tie on one node down), the quorum server most certainly is not needed for two node clusters, as fencing resolves this problem nicely, and has done so for years.
The discussion of case 1 neglects the detail that the "other" membership also must be told to not talk to the other node, same as case 2; in fact, each membership must be reduced to the common subset. The method described for case 2 indeed is not pretty, and would not work right now (as the mechanisms do not exist), as claimed:
This is quite certainly the most confusing message in this lecture. First, it is wrong today, even for Linux-HA: OCFS2 avoids this by inherting the Linux-HA membership through the Filesystem resource agent.
Second, by porting the CRM modules - now called PaceMaker - to run natively on top of openAIS, just as C-LVM2, GFS2, and OCFS2 will, we are finally on the track to solve this perfectly and having everyone use the same membership.
However, it should be noted that there has been exactly one person unhappy about this, who is now trying to sell it as if it was his idea, and not that he opposes it still - I wonder, who might that be?
I will further admit that it irks and offends me that Alan talks of the CRM as our work (as if he had been involved much in it), and explicitly mentions how he started the OCF in 2001, mentions IBM and Red Hat, yet completely fails to mention the contributions made by many Novell and SUSE engineers, most notably by Andrew Beekhof. Oh well.
FOAF updates: Trust rankings are now exported, making the data available to other users and websites. An external FOAF URI has been added, allowing users to link to an additional FOAF file.
Keep up with the latest Advogato features by reading the Advogato status blog.
If you're a C programmer with some spare time, take a look at the mod_virgule project page and help us with one of the tasks on the ToDo list!