Multiple HA resources based on same service / Heartbeat httpmon RA

PREFACE

During a failback proces of a HA cluster based on Heartbeat 3.x I faced a issue with moving resources from a node to another, let me explain why.
There are three Nodes in cluster

  • Node_Alpha – Apache with VIP 10.10.10.10
  • Node_Beta – Apache with VIP 10.10.10.11
  • Node_Gamma – Apache with VIP 10.10.10.12

Each node should run a Web resource based on Apache. Each node can failover to any other node in the cluster.

CONFIGURATION

Heartbeat is configured to not auto failback. To manage Apache resource I used ocf:heartbeat:apacheRA.
Here is some kind of template I used to configure web resource and group on each node:

Note that in this configuration, in case if Apache will fail the whole node will go to standby. Lets name a group for each node:

  • Node_Alpha – gr_apache_alpha
  • Node_Beta – gr_apache_beta
  • Node_Gamma – gr_apache_gamma

UNEXPECTED ISSUE

For some reason Node_Alpha went down and gr_apache_alpha failed over to Node_Beta. Now we have two web resources running on same node: gr_apache_alpha and gr_apache_beta. After a whileNode_Alpha came back online and it is able now to run this resource. We are trying to movegr_apache_alpha it back:

But for some reason gr_apache_beta failed and node went into standby. The reason of failure was the fact that both resources were using same instance of Apache. While gr_apache_alpha was moved back to Node_Alpha Apache was stopped, monitoring of gr_apache_beta showed that Apache is not running ( failed ) and it put node in standby mode. And this will happen every time I will try to move a gr_apache_* from where there are two or more such resources. There is workaround for this: make all gr_apache_* ( except for group you want to move ) unamanged, restart apache after move and make remained gr_apache_* managed. But this workaround can not give us 100% that a human will not make a mistake ( forget to unmanage a group ).

SOLUTION

The only solution I see is to make Apache resource not a part of group but a clone:

Then to define an RA that is used just for monitoring a resource and nothing more. There is no such RA os I had to develop it by myself. As base script I took ocf:heartbeat:apache. You can find it in my Heartbeat resources repository: https://github.com/dotNox/heartbeat_resources . It’s name is httpmon( as suggested in Linux-HA mailing list ) :). This RA can do almost the same as ocf:heartbeat:apache, but I added the ability to specify user/password for HTTP authentication ( in ocf:heartbeat:apache this is accomplished via an external configuration file which I consider not very practical … ). Here are meta parameters that are available at the time of writing this article:

  • url – URL to check
  • http_user – User for HTTP Auth ( if there is any )
  • http_password –  Password for HTTP Auth ( if there is any ) should be specified with http_userotherwise a default password “password” will be used
  • client – Client to use ( curl, wget or other ) for curl and wget there are predefined client options that are used by default
  • client_opts – Client Addition opts
  • match – Output match regular expression

The primitive definition template will look like:

And of course group which consist of VIPs and web_xxx resources:

Many Thanks to Nox for this article

This entry was posted in Linux Post. Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">