Ready to know about downtime before your customers?
Status List delivers uptime monitoring and professional hosted status pages for sites of all shapes and sizes.
Trusted by 1000+ companies
Keepalived allows us to run multiple HAProxy instances on the same IP address. This allows for failover from one load balancer to another. With this configuration, we can achieve near-zero downtime even with HAProxy configuration changes, machine updates etc. Let’s get this configured and setup for your deployment.
Here’s a rough diagram of what we’re trying to setup.
First, you will need a Public IP that we can send your traffic to. This can’t be auto-assigned to a VM, or auto generated some how. You need a public, reserved ip address that we can use. Here are some links to set this up if you are on AWS, Digital Ocean, Azure or GCP.
Next, setup your machines and HAProxy instances. You’ll want to configure each HAProxy instance to be identical. You can simply copy/paste your configuration. See our HAProxy Configuration Guide for more details.
For each machine, we’re going to add one additional configuration directive. We’re going to add an http header to all responses so we can identify which machine is actively serving. Add the following to your front-end section:
http-response add-header X-Haproxy-Instance lb_1 # replace lb_1 with lb_2, lb_3, etc.
Start the HAProxy service and set it to auto-start.
Let’s install and configure the keepalived on each system. Use your package manager to install the keepalived package. (e.g. yum install keepalived). Open the configuration file located at /etc/keepalived/keepalived.conf. Edit the configuration to include the following:
vrrp_instance MY_VI {
state MASTER
interface eth0 # your public network interface
lvs_sync_daemon_interface eth1 # your private network interface
virtual_router_id 32
priority 200
advert_int 1
authentication {
auth_type PASS
auth_pass mypassword
}
virtual_ipaddress {
8.3.392.203/32 # <- your public ip, or an ip range
}
notify_master /usr/local/bin/keepalived_is_master.sh
}
vrrp_script keepalived_check {
script "/usr/local/bin/keepalived_check.sh"
interval 1
timeout 5
rise 3
fall 3
}
Let’s explain what’s going on in our configuration example above. There are two blocks, the instance and the script blocks.
The vrrp_instance block creates a high availability group. We can call the group anything we want as long is it matches on our other host. The parameters in this block should match exactly between hosts.
The vrrp_script block creates a health check for the group. It will run the script linked and use the output to determine if this host is available to receive traffic. This script should be able to execute quickly.
Trusted by 1000+ companies
The health check is the most important part of Keepalived. It tells Keepalived wether this host is available to receive traffic or not. This is your failover mechanism.
There are a couple things we could check for this health check. We could check that HAProxy is running or we could check that HAProxy is functioning correctly. Checking HAProxy’s functions is more useful than just checking if it’s running. HAProxy could be running, but failing all our requests. That wouldn’t be good.
Here’s a simple health check to see if HAProxy is running. We’re just checking if a process named HAProxy exists. Here’s an example of how you could set that up:
# edit /usr/local/bin/keepalived_check.sh
pidof haproxy
Here’s a more advanced health check to see if HAProxy is functioning correctly. For this check, we’ll need the HAProxy stats socket enabled. You can learn more about this at our HAProxy stats socket guide.
In this example, we’re checking the “Stopping” field of the show info socket command. This will tell us if HAProxy is in shutdown mode. We could check for other things like memory usage, queue size, or idle percentage as well.
echo "show info" | socat /var/run/haproxy.sock stdio | grep "Stopping: 0"
Let’s make sure Keepalived can communicate with itself. We’ll run the Keepalived coordination traffic through our private network. The client traffic, we’ll be routing through the public network.
Ensure TCP Port 112 is open on your private network interface (the same one you used for lvs_sync_daemon_interface). Also check that each machine can reach one another on the private network.
If you’re running on a cloud platform like AWS, Digital Ocean, GCP or Azure, we’re going to need some routing configuration. We need to tell our cloud platform to update our routing configuration for our Public IP when Keepalived requests a change.
We can do this by asking Keepalived to run a script when the master is changed. The Keepalived configuration was already set in a previous step. Let’s create the script file now.
Create the file /usr/local/bin/keepalived_is_master.sh with the following content:
PUBLIC_IP='xxx.xxx.xxx'
LB_NAME="lb_1" # replace with haproxy indentifier from step #2
# check if this machine is serving to PUBLIC_IP
curl --head -s http://my-domain.com/ | grep "X-Haproxy-Instance: $LB_NAME"
# if the public_ip is assigned to another instance, reassign.
if [ $? -eq 0 ]; then
n=0
while [ $n -lt 10 ] # retry if assign-ip fails
do
python /usr/local/bin/assign-ip $IP && break
n=$((n+1))
sleep 3
done
fi
Now we need a way to tell the cloud platform to update our routing. Create a python script at /usr/local/bin/assign-ip to do this. Here’s an example of for Digital Ocean, your cloud provider may have a different sdk for this:
#!/usr/bin/python
import os
import sys
import requests
import json
api_base = 'https://api.digitalocean.com/v2'
def usage():
print('{0} [Floating IP] [Droplet ID]'.format(sys.argv[0]))
print('\nYour DigitialOcean API token must be in the "DO_TOKEN"'
' environmental variable.')
def main(floating_ip, droplet_id):
payload = {'type': 'assign', 'droplet_id': droplet_id}
headers = {'Authorization': 'Bearer {0}'.format(os.environ['DO_TOKEN']),
'Content-type': 'application/json'}
url = api_base + "/floating_ips/{0}/actions".format(floating_ip)
r = requests.post(url, headers=headers, data=json.dumps(payload))
resp = r.json()
if 'message' in resp:
print('{0}: {1}'.format(resp['id'], resp['message']))
sys.exit(1)
else:
print('Moving IP address: {0}'.format(resp['action']['status']))
if __name__ == "__main__":
if 'DO_TOKEN' not in os.environ or not len(sys.argv) > 2:
usage()
sys.exit()
main(sys.argv[1], sys.argv[2])
Start up your Keepalived services and set them to auto start. Within a few seconds you should see one of your machines take over the public ip address. From now on, your keepalived services will control which machine receives the network traffic. If one of your HAProxy instances fails it’s health check, Keepalived will update your network configuration and route the traffic to a fall over instance.
You can test this out by locating your master machine and turning off it’s HAProxy instance. Within 3 seconds you should see your routing configuration update to a fall over.
© Status List 2024