The following is part 1 of a 4 part series that will go over an installation and configuration of Pacemaker, Corosync, Apache, DRBD and a VMware STONITH agent.
The aim here is to build an active/passive Pacemaker cluster with Apache and DRBD.
Before We Begin
Pacemaker is a sophisticated, feature-rich, and widely deployed cluster resource manager for the Linux platform. At its core, Pacemaker is a distributed finite state machine capable of co-ordinating the startup and recovery of inter-related services across a set of machines.
Pacemaker achieves maximum availability for cluster services (aka resources) by detecting and recovering from node and resource-level failures by making use of the messaging and membership capabilities provided by a preferred cluster infrastructure (either OpenAIS or Heartbeat).
Pacemaker is a continuation of the CRM (aka v2 resource manager) that was originally developed for Heartbeat but has since become its own project.
We will build a failover cluster, meaning that services may be spread over all cluster nodes.
Pacemaker Stack
A Pacemaker stack is built on five core components:
- libQB – core services (logging, IPC, etc),
- Corosync – membership, messaging and quorum,
- Resource agents – a collection of scripts that interact with the underlying services managed by the cluster,
- Fencing agents – a colllection of scripts that interact with network power switches and SAN devices to isolate cluster members,
- Pacemaker itself.
As of RHEL 6.5, pcs and pacemaker are fully supported, see here for more info: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-pacemaker65-70-HAAR.html.
The pcs package provides a command-line tool for configuring and managing the corosync and pacemaker utilities.
RHEL 7 replaced RGManager with Pacemaker for managing cluster resources and recovering from node failures.
Pacemaker Configuration Tools
- crmsh – the original configuration shell for Pacemaker,
- pcs – Pacemaker/Corosync Configuration System, an alternate vision for a full cluster lifecycle configuration shell and web based GUI.
Note that the original cluster shell (crmsh) is no longer available on RHEL.
The pcs command line interface provides the ability to control and configure corosync and pacemaker.
Linux-HA Best Practise
For resilience, every cluster should have at least two Corosync (read: heartbeat) rings and two fencing devices, to eliminate a single point of failure.
Marking
The convention followed in the series is that [ALL] # denotes a command that needs to be run on all cluster machines.
Software
Software used in the series:
- CentOS Linux release 7.2.1511 (Core)
- pacemaker-1.1.13
- corosync-2.3.4
- pcs-0.9.143
- resource-agents-3.9.5
- fence-agents-vmware-soap-4.0.11
- drbd-8.4
Networking, Firewall and SELinux Configuration
We will build a two-node active/passive cluster using Pacemaker and Corosync.
We have two CentOS 7 virtual machines on VMware, named vm-pcmk01 and vm-pcmk02.
Networking
The following networks will be in use:
- 10.247.50.0/24 – LAN with access to the Internet,
- 172.16.21.0/24 – non-routable cluster heartbeat vlan for Corosync,
- 172.16.22.0/24 – non-routable cluster heartbeat vlan for DRBD.
Hostnames and IPs which we have allocated:
Hostname | LAN IP |
pcmk01, vm-pcmk01 | 10.247.50.211 |
pcmk02, vm-pcmk02 | 10.247.50.212 |
pcmk-vip (floating cluster resource) | 10.247.50.213 |
Hostname | Corosync IP |
pcmk01-cr | 172.16.21.11 |
pcmk02-cr | 172.16.21.12 |
Hostname | DRBD IP |
pcmk01-drbd | 172.16.22.11 |
pcmk02-drbd | 172.16.22.12 |
The /etc/hosts
file entries look as follows:
[ALL]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.247.50.211 pcmk01 vm-pcmk01 10.247.50.212 pcmk02 vm-pcmk02 10.247.50.213 pcmk-vip 172.16.21.11 pcmk01-cr 172.16.21.12 pcmk02-cr 172.16.22.11 pcmk01-drbd 172.16.22.12 pcmk02-drbd
Network configuration for the first node can be seen below, it is the same for the second node except the IPs which are specified above.
[pcmk01]# cat /etc/sysconfig/network-scripts/ifcfg-ens192 #LAN NAME="ens192" DEVICE="ens192" TYPE="Ethernet" BOOTPROTO="none" DEFROUTE="yes" PEERDNS="yes" IPV4_FAILURE_FATAL="yes" IPV6INIT="no" ONBOOT="yes" IPADDR="10.247.50.211" PREFIX="24" GATEWAY="10.247.50.1" DNS1="8.8.8.8" DNS2="8.8.4.4"
[pcmk01]# cat /etc/sysconfig/network-scripts/ifcfg-ens224 #Corosync ring0 NAME="ens224" DEVICE="ens224" TYPE="Ethernet" BOOTPROTO="none" DEFROUTE="no" PEERDNS="no" IPV4_FAILURE_FATAL="yes" IPV6INIT="no" ONBOOT="yes" IPADDR="172.16.21.11" PREFIX="24"
[pcmk01]# cat /etc/sysconfig/network-scripts/ifcfg-ens256 #DRBD NAME="ens256" DEVICE="ens256" TYPE="Ethernet" BOOTPROTO="none" DEFROUTE="no" PEERDNS="no" IPV4_FAILURE_FATAL="yes" IPV6INIT="no" ONBOOT="yes" IPADDR="172.16.22.11" PREFIX="24"
Iptables
This article uses Iptables firewall. Note that CentOS 7 utilises FirewallD as the default firewall management tool.
Having spent a great amount of time learning Iptables (read: IP masquerade NAT postrouting WTH?), we can safely say that we know where all the bodies are buried… We’re keen in learning FirewallD one day, though.
The choice is obviously yours, however, here’s how to replace FirewallD service with Iptables:
[ALL]# systemctl stop firewalld.service [ALL]# systemctl mask firewalld.service [ALL]# systemctl daemon-reload [ALL]# yum install -y iptables-services [ALL]# systemctl enable iptables.service [ALL]# service iptables save
These are the iptables rules that we have in use:
[ALL]# iptables -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -s 10.0.0.0/8 -p tcp -m tcp --dport 22 -m state --state NEW -j ACCEPT -A INPUT -s 10.0.0.0/8 -p tcp -m multiport --dports 80,443 -j ACCEPT -A INPUT -s 172.16.21.0/24 -d 172.16.21.0/24 -p udp -m multiport --dports 5405 -j ACCEPT -A INPUT -s 172.16.21.0/24 -d 172.16.21.0/24 -p tcp -m multiport --dports 2224 -j ACCEPT -A INPUT -s 10.0.0.0/8 -p tcp -m multiport --dports 2224 -j ACCEPT -A INPUT -s 10.0.0.0/8 -p tcp -m multiport --dports 3121 -j ACCEPT -A INPUT -s 10.0.0.0/8 -p tcp -m multiport --dports 21064 -j ACCEPT -A INPUT -s 172.16.22.0/24 -d 172.16.22.0/24 -p tcp -m multiport --dports 7788,7789 -j ACCEPT -A INPUT -p udp -m multiport --dports 137,138,139,445 -j DROP -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -j LOG --log-prefix "iptables_input " -A INPUT -j DROP
SELinux
SELinux is set to enforcing mode.
Install Pacemaker and Corosync
[ALL]# yum install -y pcs
The pcs will install pacemaker, corosync and resource-agents as dependencies.
For SELinux management:
[ALL]# yum install -y policycoreutils-python
Set a password for the hacluster user:
[ALL]# echo "passwd" | passwd hacluster --stdin
Start and enable the service:
[ALL]# systemctl start pcsd.service [ALL]# systemctl enable pcsd.service
Configure Corosync
Authenticate as the hacluster user. Note that we use a dedicated Corosync interface for this.
[pcmk01]# pcs cluster auth pcmk01-cr pcmk02-cr -u hacluster -p passwd pcmk01-cr: Authorized pcmk02-cr: Authorized
Authorisation tokens are stored in the file /var/lib/pcsd/tokens
.
Generate and synchronise the Corosync configuration
[pcmk01]# pcs cluster setup --name test_webcluster pcmk01-cr pcmk02-cr
Start the cluster on all nodes:
[pcmk01]# pcs cluster start --all
Optionally, depending on requirements, we can enable cluster services to start on boot:
[ALL]# pcs cluster enable --all
Verify Corosync installation:
[pcmk01]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 172.16.21.11 status = ring 0 active with no faults
[pcmk01]# pcs status corosync Membership information ---------------------- Nodeid Votes Name 1 1 pcmk01-cr (local) 2 1 pcmk02-cr
Corosync configuration for future references:
[pcmk01]# cat /etc/corosync/corosync.conf totem { version: 2 secauth: off cluster_name: test_webcluster transport: udpu } nodelist { node { ring0_addr: pcmk01-cr nodeid: 1 } node { ring0_addr: pcmk02-cr nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_syslog: yes }
Let us check the cluster status now:
[pcmk01]# pcs status Cluster name: test_webcluster WARNING: no stonith devices and stonith-enabled is not false Last updated: Sat Dec 12 15:24:14 2015 Last change: Sat Dec 12 15:24:08 2015 by hacluster via crmd on pcmk02-cr Stack: corosync Current DC: pcmk02-cr (version 1.1.13-a14efad) - partition with quorum 2 nodes and 0 resources configured Online: [ pcmk01-cr pcmk02-cr ] Full list of resources: PCSD Status: pcmk01-cr: Online pcmk02-cr: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
We can also see the raw (XML) cluster configuration and status by using the following commands:
[pcmk01]# pcs cluster cib [pcmk01]# cibadmin -Q
If we inspect the raw output, we can see that the Pacemaker configuration XML file contains the following sections:
- <configuration>
- <nodes>
- <resources>
- <constraints>
- <status>
Disable STONITH
Check the list of available STONITH agents, it should be empty:
[pcmk01]# pcs stonith list
Disable STONITH for now as we don’t have any agents installed, however, we will configure it later in the series:
[pcmk01]# pcs property set stonith-enabled=false [pcmk01]# crm_verify -LV
In production environments it is vitally important to enable STONITH.
Be advised that the use of stonith-enabled=false is completely inappropriate for a production cluster and may cost you your precious job.
STONITH/fencing actions should be taken into account in all cluster setups, but especially when running a dual-primary setup, since in such case access to data is available from more then one node at the same time.
Disable Quorum
You may get the following error when using STONITH:
ERROR: "Cannot fence unclean nodes until quorum is attained (or no-quorum-policy is set to ignore)"
Check the quorum property:
[pcmk01]# pcs property list --all|grep quorum no-quorum-policy: stop
A cluster has quorum when more than half of the nodes are online. Pacemaker’s default behavior is to stop all resources if the cluster does not have quorum. However, this does not make much sense in a two-node cluster; the cluster will lose quorum if one node fails.
We can tell Pacemaker to ignore quorum by setting the no-quorum-policy:
[pcmk01]# pcs property set no-quorum-policy=ignore
The following is part 1 of a 4 part series that will go over an installation and configuration of Pacemaker, Corosync, Apache, DRBD and a VMware STONITH agent.
The aim here is to build an active/passive Pacemaker cluster with Apache and DRBD.
Before We Begin
Pacemaker is a sophisticated, feature-rich, and widely deployed cluster resource manager for the Linux platform. At its core, Pacemaker is a distributed finite state machine capable of co-ordinating the startup and recovery of inter-related services across a set of machines.
Pacemaker achieves maximum availability for cluster services (aka resources) by detecting and recovering from node and resource-level failures by making use of the messaging and membership capabilities provided by a preferred cluster infrastructure (either OpenAIS or Heartbeat).
Pacemaker is a continuation of the CRM (aka v2 resource manager) that was originally developed for Heartbeat but has since become its own project.
We will build a failover cluster, meaning that services may be spread over all cluster nodes.
Pacemaker Stack
A Pacemaker stack is built on five core components:
- libQB – core services (logging, IPC, etc),
- Corosync – membership, messaging and quorum,
- Resource agents – a collection of scripts that interact with the underlying services managed by the cluster,
- Fencing agents – a colllection of scripts that interact with network power switches and SAN devices to isolate cluster members,
- Pacemaker itself.
As of RHEL 6.5, pcs and pacemaker are fully supported, see here for more info: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/High_Availability_Add-On_Reference/s1-pacemaker65-70-HAAR.html.
The pcs package provides a command-line tool for configuring and managing the corosync and pacemaker utilities.
RHEL 7 replaced RGManager with Pacemaker for managing cluster resources and recovering from node failures.
Pacemaker Configuration Tools
- crmsh – the original configuration shell for Pacemaker,
- pcs – Pacemaker/Corosync Configuration System, an alternate vision for a full cluster lifecycle configuration shell and web based GUI.
Note that the original cluster shell (crmsh) is no longer available on RHEL.
The pcs command line interface provides the ability to control and configure corosync and pacemaker.
Linux-HA Best Practise
For resilience, every cluster should have at least two Corosync (read: heartbeat) rings and two fencing devices, to eliminate a single point of failure.
Marking
The convention followed in the series is that [ALL] # denotes a command that needs to be run on all cluster machines.
Software
Software used in the series:
- CentOS Linux release 7.2.1511 (Core)
- pacemaker-1.1.13
- corosync-2.3.4
- pcs-0.9.143
- resource-agents-3.9.5
- fence-agents-vmware-soap-4.0.11
- drbd-8.4
Networking, Firewall and SELinux Configuration
We will build a two-node active/passive cluster using Pacemaker and Corosync.
We have two CentOS 7 virtual machines on VMware, named vm-pcmk01 and vm-pcmk02.
Networking
The following networks will be in use:
- 10.247.50.0/24 – LAN with access to the Internet,
- 172.16.21.0/24 – non-routable cluster heartbeat vlan for Corosync,
- 172.16.22.0/24 – non-routable cluster heartbeat vlan for DRBD.
Hostnames and IPs which we have allocated:
Hostname | LAN IP |
pcmk01, vm-pcmk01 | 10.247.50.211 |
pcmk02, vm-pcmk02 | 10.247.50.212 |
pcmk-vip (floating cluster resource) | 10.247.50.213 |
Hostname | Corosync IP |
pcmk01-cr | 172.16.21.11 |
pcmk02-cr | 172.16.21.12 |
Hostname | DRBD IP |
pcmk01-drbd | 172.16.22.11 |
pcmk02-drbd | 172.16.22.12 |
The /etc/hosts
file entries look as follows:
[ALL]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.247.50.211 pcmk01 vm-pcmk01 10.247.50.212 pcmk02 vm-pcmk02 10.247.50.213 pcmk-vip 172.16.21.11 pcmk01-cr 172.16.21.12 pcmk02-cr 172.16.22.11 pcmk01-drbd 172.16.22.12 pcmk02-drbd
Network configuration for the first node can be seen below, it is the same for the second node except the IPs which are specified above.
[pcmk01]# cat /etc/sysconfig/network-scripts/ifcfg-ens192 #LAN NAME="ens192" DEVICE="ens192" TYPE="Ethernet" BOOTPROTO="none" DEFROUTE="yes" PEERDNS="yes" IPV4_FAILURE_FATAL="yes" IPV6INIT="no" ONBOOT="yes" IPADDR="10.247.50.211" PREFIX="24" GATEWAY="10.247.50.1" DNS1="8.8.8.8" DNS2="8.8.4.4"
[pcmk01]# cat /etc/sysconfig/network-scripts/ifcfg-ens224 #Corosync ring0 NAME="ens224" DEVICE="ens224" TYPE="Ethernet" BOOTPROTO="none" DEFROUTE="no" PEERDNS="no" IPV4_FAILURE_FATAL="yes" IPV6INIT="no" ONBOOT="yes" IPADDR="172.16.21.11" PREFIX="24"
[pcmk01]# cat /etc/sysconfig/network-scripts/ifcfg-ens256 #DRBD NAME="ens256" DEVICE="ens256" TYPE="Ethernet" BOOTPROTO="none" DEFROUTE="no" PEERDNS="no" IPV4_FAILURE_FATAL="yes" IPV6INIT="no" ONBOOT="yes" IPADDR="172.16.22.11" PREFIX="24"
Iptables
This article uses Iptables firewall. Note that CentOS 7 utilises FirewallD as the default firewall management tool.
Having spent a great amount of time learning Iptables (read: IP masquerade NAT postrouting WTH?), we can safely say that we know where all the bodies are buried… We’re keen in learning FirewallD one day, though.
The choice is obviously yours, however, here’s how to replace FirewallD service with Iptables:
[ALL]# systemctl stop firewalld.service [ALL]# systemctl mask firewalld.service [ALL]# systemctl daemon-reload [ALL]# yum install -y iptables-services [ALL]# systemctl enable iptables.service [ALL]# service iptables save
These are the iptables rules that we have in use:
[ALL]# iptables -S -P INPUT ACCEPT -P FORWARD ACCEPT -P OUTPUT ACCEPT -A INPUT -p icmp -j ACCEPT -A INPUT -i lo -j ACCEPT -A INPUT -s 10.0.0.0/8 -p tcp -m tcp --dport 22 -m state --state NEW -j ACCEPT -A INPUT -s 10.0.0.0/8 -p tcp -m multiport --dports 80,443 -j ACCEPT -A INPUT -s 172.16.21.0/24 -d 172.16.21.0/24 -p udp -m multiport --dports 5405 -j ACCEPT -A INPUT -s 172.16.21.0/24 -d 172.16.21.0/24 -p tcp -m multiport --dports 2224 -j ACCEPT -A INPUT -s 10.0.0.0/8 -p tcp -m multiport --dports 2224 -j ACCEPT -A INPUT -s 10.0.0.0/8 -p tcp -m multiport --dports 3121 -j ACCEPT -A INPUT -s 10.0.0.0/8 -p tcp -m multiport --dports 21064 -j ACCEPT -A INPUT -s 172.16.22.0/24 -d 172.16.22.0/24 -p tcp -m multiport --dports 7788,7789 -j ACCEPT -A INPUT -p udp -m multiport --dports 137,138,139,445 -j DROP -A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT -A INPUT -j LOG --log-prefix "iptables_input " -A INPUT -j DROP
SELinux
SELinux is set to enforcing mode.
Install Pacemaker and Corosync
[ALL]# yum install -y pcs
The pcs will install pacemaker, corosync and resource-agents as dependencies.
For SELinux management:
[ALL]# yum install -y policycoreutils-python
Set a password for the hacluster user:
[ALL]# echo "passwd" | passwd hacluster --stdin
Start and enable the service:
[ALL]# systemctl start pcsd.service [ALL]# systemctl enable pcsd.service
Configure Corosync
Authenticate as the hacluster user. Note that we use a dedicated Corosync interface for this.
[pcmk01]# pcs cluster auth pcmk01-cr pcmk02-cr -u hacluster -p passwd pcmk01-cr: Authorized pcmk02-cr: Authorized
Authorisation tokens are stored in the file /var/lib/pcsd/tokens
.
Generate and synchronise the Corosync configuration
[pcmk01]# pcs cluster setup --name test_webcluster pcmk01-cr pcmk02-cr
Start the cluster on all nodes:
[pcmk01]# pcs cluster start --all
Optionally, depending on requirements, we can enable cluster services to start on boot:
[ALL]# pcs cluster enable --all
Verify Corosync installation:
[pcmk01]# corosync-cfgtool -s Printing ring status. Local node ID 1 RING ID 0 id = 172.16.21.11 status = ring 0 active with no faults
[pcmk01]# pcs status corosync Membership information ---------------------- Nodeid Votes Name 1 1 pcmk01-cr (local) 2 1 pcmk02-cr
Corosync configuration for future references:
[pcmk01]# cat /etc/corosync/corosync.conf totem { version: 2 secauth: off cluster_name: test_webcluster transport: udpu } nodelist { node { ring0_addr: pcmk01-cr nodeid: 1 } node { ring0_addr: pcmk02-cr nodeid: 2 } } quorum { provider: corosync_votequorum two_node: 1 } logging { to_syslog: yes }
Let us check the cluster status now:
[pcmk01]# pcs status Cluster name: test_webcluster WARNING: no stonith devices and stonith-enabled is not false Last updated: Sat Dec 12 15:24:14 2015 Last change: Sat Dec 12 15:24:08 2015 by hacluster via crmd on pcmk02-cr Stack: corosync Current DC: pcmk02-cr (version 1.1.13-a14efad) - partition with quorum 2 nodes and 0 resources configured Online: [ pcmk01-cr pcmk02-cr ] Full list of resources: PCSD Status: pcmk01-cr: Online pcmk02-cr: Online Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
We can also see the raw (XML) cluster configuration and status by using the following commands:
[pcmk01]# pcs cluster cib [pcmk01]# cibadmin -Q
If we inspect the raw output, we can see that the Pacemaker configuration XML file contains the following sections:
- <configuration>
- <nodes>
- <resources>
- <constraints>
- <status>
Disable STONITH
Check the list of available STONITH agents, it should be empty:
[pcmk01]# pcs stonith list
Disable STONITH for now as we don’t have any agents installed, however, we will configure it later in the series:
[pcmk01]# pcs property set stonith-enabled=false [pcmk01]# crm_verify -LV
In production environments it is vitally important to enable STONITH.
Be advised that the use of stonith-enabled=false is completely inappropriate for a production cluster and may cost you your precious job.
STONITH/fencing actions should be taken into account in all cluster setups, but especially when running a dual-primary setup, since in such case access to data is available from more then one node at the same time.
Disable Quorum
You may get the following error when using STONITH:
ERROR: "Cannot fence unclean nodes until quorum is attained (or no-quorum-policy is set to ignore)"
Check the quorum property:
[pcmk01]# pcs property list --all|grep quorum no-quorum-policy: stop
A cluster has quorum when more than half of the nodes are online. Pacemaker’s default behavior is to stop all resources if the cluster does not have quorum. However, this does not make much sense in a two-node cluster; the cluster will lose quorum if one node fails.
We can tell Pacemaker to ignore quorum by setting the no-quorum-policy:
[pcmk01]# pcs property set no-quorum-policy=ignore
We now have a pacemaker cluster installed and running on two nodes.