1. 背景
最近为一个传统行业的客户开发了一个生产调度系统。但是他的生产环境是自己搭建的,不是购买的云服务,因此碰到了一些问题。
客户购买了多台高性能的物理机,在上面自行搭建虚拟机。但是只有虚拟机,没有提供类似负载均衡、数据库等基础服务。为了解决负载均衡问题,我们自行安装了 Nginx
。为了保证系统健壮和性能,安装 Nginx
的两台虚拟机(Ubuntu 18.04.3 LTS,172.16.45.22,172.16.45.26)分别部署在两台不同的物理机上(172.16.45.11, 172.168.45.12)。然后 DNS
解析到两台 Nginx
主机上。
现在的问题是:客户的要求是任何一台物理机出现问题都要能够保证系统可用,验证的方式是测试的时候会直接断开物理机的网线,这将导致通过 DNS
解析出来的 Nginx
主机必然有一台是无法使用的。访问显然会出问题。
经过考虑,最后选定解决办法是:使用 KeepAlived
做高可用,两台 Nginx
共同维护一个虚拟IP(172.16.45.50),DNS
解析到这个虚拟IP上。
这里简单记录一下安装 KeepAlived
的过程以及碰到的一些问题及解决方案。
2.安装KeepAlived
在 Ubuntu
上安装 KeepAlived
非常简单:
1apt install keepalived
安装的版本是1.3.9。不算新,但是应该够用。
3. 配置文件
配置文件名位置为:/etc/keepalived/keepalived.conf
,如果文件不存在,KeepAlived
将不能够启动,请自行创建该文件。
1! Configuration File for keepalived
2# Global definitions configuration block
3global_defs {
4 # String identifying the machine (doesn't have to be hostname).
5 # (default: local host name)
6 router_id server01
7}
8
9# A VRRP Instance is the VRRP protocol key feature. It defines and con-
10# figures VRRP behaviour to run on a specific interface. Each VRRP
11# Instances are related to a uniq interface.
12vrrp_instance VI_1 {
13 # Initial state, MASTER|BACKUP
14 # As soon as the other machine(s) come up,
15 # an election will be held and the machine
16 # with the highest priority will become MASTER.
17 # So the entry here doesn't matter a whole lot.
18 state MASTER
19 # interface for inside_network, bound by vrrp
20 interface eth0
21 # arbitrary unique number from 1 to 255
22 # used to differentiate multiple instances of vrrpd
23 # running on the same NIC (and hence same socket).
24 virtual_router_id 50
25 # for electing MASTER, highest priority wins.
26 # to be MASTER, make this 50 more than on other machines.
27 priority 100
28 # VRRP Advert interval in seconds (e.g. 0.92) (use default)
29 advert_int 1
30 # Note: authentication was removed from the VRRPv2 specification by
31 # RFC3768 in 2004.
32 # Use of this option is non-compliant and can cause problems; avoid
33 # using if possible, except when using unicast, where it can be helpful.
34 authentication {
35 # PASS|AH
36 # PASS - Simple password (suggested)
37 # AH - IPSEC (not recommended))
38 auth_type PASS
39 # Password for accessing vrrpd.
40 # should be the same on all machines.
41 # Only the first eight (8) characters are used.
42 auth_pass Password@123
43 }
44 # addresses add|del on change to MASTER, to BACKUP.
45 # With the same entries on other machines,
46 # the opposite transition will be occurring.
47 # For virutal_ipaddress, virtual_ipaddress_excluded,
48 # virtual_routes and virtual_rules most of the options
49 # match the options of the command ip address/route/rule add.
50 # The track_group option only applies to static addresses/routes/rules.
51 # no_track is specific to keepalived and means that the
52 # vrrp_instance will not transition out of master state
53 # if the address/route/rule is deleted and the address/route/rule
54 # will not be reinstated until the vrrp instance next transitions
55 # to master.
56 # <LABEL>: is optional and creates a name for the alias.
57 For compatibility with "ifconfig", it should
58 be of the form <realdev>:<anytext>, for example
59 eth0:1 for an alias on eth0.
60 # <SCOPE>: ("site"|"link"|"host"|"nowhere"|"global")
61 virtual_ipaddress {
62 172.16.45.50
63 }
64}
主要配置参数可以参考其中的注释说明。几个关键参数简单说一下:
state
用于指定当前主机的在KeepAlived
中的初始状态。可以为MASTER
和BACKUP
。虚拟IP将绑定在MASTER
上;interface
指定绑定的网卡;vrrp_instance
定义一个虚拟网络,这是KeepAlived
定义的一个基本单位;virtual_ipaddress
指定绑定的虚拟IP。虚拟IP会根据网络情况在MASTER
和BACKUP
之间漂移;
改完配置文件之后使用命令:service keepalived restart
重启服务即可。
4. 验证
使用ip addr show eth0
查看网卡的IP地址。默认情况下,只有MASTER上面会有新增的虚拟IP。
使用service keepalived stop
停掉主节点的 KeepAlived
,然后就会发现虚拟IP出现在 BACKUP
节点上来。再次启动 MASTER
上的 KeepAlived
,虚拟IP又会出现在 MASTER
节点上。
日志如下:
MASTER
节点:
1Jan 09 16:56:18 server01 systemd[1]: Stopping Keepalive Daemon (LVS and VRRP)...
2Jan 09 16:56:18 server01 Keepalived[2360]: Stopping
3Jan 09 16:56:18 server01 Keepalived_vrrp[2367]: VRRP_Instance(VI_1) sent 0 priority
4Jan 09 16:56:18 server01 Keepalived_healthcheckers[2366]: Stopped
5Jan 09 16:56:19 server01 Keepalived_vrrp[2367]: Stopped
6Jan 09 16:56:19 server01 Keepalived[2360]: Stopped Keepalived v1.3.9 (10/21,2017)
7Jan 09 16:56:19 server01 systemd[1]: Stopped Keepalive Daemon (LVS and VRRP).
8Jan 09 16:56:19 server01 systemd[1]: Starting Keepalive Daemon (LVS and VRRP)...
9Jan 09 16:56:19 server01 Keepalived[2408]: Starting Keepalived v1.3.9 (10/21,2017)
10Jan 09 16:56:19 server01 Keepalived[2408]: Opening file '/etc/keepalived/keepalived.conf'.
11Jan 09 16:56:19 server01 systemd[1]: Started Keepalive Daemon (LVS and VRRP).
12Jan 09 16:56:19 server01 Keepalived[2412]: Starting Healthcheck child process, pid=2416
13Jan 09 16:56:19 server01 Keepalived_healthcheckers[2416]: Opening file '/etc/keepalived/keepalived.conf'.
14Jan 09 16:56:19 server01 Keepalived[2412]: Starting VRRP child process, pid=2418
15Jan 09 16:56:19 server01 Keepalived_vrrp[2418]: Registering Kernel netlink reflector
16Jan 09 16:56:19 server01 Keepalived_vrrp[2418]: Registering Kernel netlink command channel
17Jan 09 16:56:19 server01 Keepalived_vrrp[2418]: Registering gratuitous ARP shared channel
18Jan 09 16:56:19 server01 Keepalived_vrrp[2418]: Opening file '/etc/keepalived/keepalived.conf'.
19Jan 09 16:56:19 server01 Keepalived_vrrp[2418]: Using LinkWatch kernel netlink reflector...
20Jan 09 16:56:19 server01 Keepalived_vrrp[2418]: VRRP_Instance(VI_1) Transition to MASTER STATE
21Jan 09 16:56:20 server01 Keepalived_vrrp[2418]: VRRP_Instance(VI_1) Entering MASTER STATE
BACKUP
节点:
1Jan 09 16:55:16 server02 systemd[1]: Starting Keepalive Daemon (LVS and VRRP)...
2Jan 09 16:55:16 server02 Keepalived[96635]: Starting Keepalived v1.3.9 (10/21,2017)
3Jan 09 16:55:16 server02 Keepalived[96635]: Opening file '/etc/keepalived/keepalived.conf'.
4Jan 09 16:55:16 server02 systemd[1]: Started Keepalive Daemon (LVS and VRRP).
5Jan 09 16:55:16 server02 Keepalived[96644]: Starting Healthcheck child process, pid=96650
6Jan 09 16:55:16 server02 Keepalived_healthcheckers[96650]: Opening file '/etc/keepalived/keepalived.conf'.
7Jan 09 16:55:16 server02 Keepalived[96644]: Starting VRRP child process, pid=96651
8Jan 09 16:55:16 server02 Keepalived_vrrp[96651]: Registering Kernel netlink reflector
9Jan 09 16:55:16 server02 Keepalived_vrrp[96651]: Registering Kernel netlink command channel
10Jan 09 16:55:16 server02 Keepalived_vrrp[96651]: Registering gratuitous ARP shared channel
11Jan 09 16:55:16 server02 Keepalived_vrrp[96651]: Opening file '/etc/keepalived/keepalived.conf'.
12Jan 09 16:55:16 server02 Keepalived_vrrp[96651]: Truncating auth_pass to 8 characters
13Jan 09 16:55:16 server02 Keepalived_vrrp[96651]: Using LinkWatch kernel netlink reflector...
14Jan 09 16:55:16 server02 Keepalived_vrrp[96651]: VRRP_Instance(VI_1) Entering BACKUP STATE
15Jan 09 16:56:18 server02 Keepalived_vrrp[96651]: VRRP_Instance(VI_1) Transition to MASTER STATE
16Jan 09 16:56:19 server02 Keepalived_vrrp[96651]: VRRP_Instance(VI_1) Entering MASTER STATE
17Jan 09 16:56:19 server02 Keepalived_vrrp[96651]: VRRP_Instance(VI_1) Received advert with higher priority 100, ours 50
18Jan 09 16:56:19 server02 Keepalived_vrrp[96651]: VRRP_Instance(VI_1) Entering BACKUP STATE
5. 剩余问题
观察可以发现,上面方案实行后会导致只有一台 Nginx
生效,另一台上因为没有绑定虚拟IP,不会被访问到,这相当于将两台同等地位的 Nginx
变成了主从模式。
一个可行的解决办法是采用双主模式部署 KeepAlived
。方法是配置两个vrrp_instance
,两台主机分别作为两个vrrp_instance
的 MASTER/BACKUP
。
例如:第一个vrrp_instance
中 server01
作为 MASTER
,server02
作为 BACKUP
,第二个vrrp_instance
中 server01
作为 BACKUP
,server02
作为 MASTER
.这样正常情况下,两个虚拟IP分别指向 server01
和 server02
,达到了负载均衡的目的。当网络出现状况的时候 - 例如 server02
出现故障,server01
就变成了两个 vrrp_instance
的MASTER
,接管了所有的网络通讯,从而实现了高可用。具体的配置就不贴了,可以自行阅读参考资料。
附录、参考资料
- KeepAlived Homepage
- KeepAlived Documentation
- VRRP原理和分析
- Keepalived原理
- KeepAlived introduction
- How to Setup IP Failover with KeepAlived on Ubuntu & Debian
- How to setup a highly available load balancer with keepalived and HAProxy on Ubuntu 18.04
- Linux Virtual Server技术
- Nginx+keepalived 高可用双机热备(主从模式/双主模式)
- Nginx-keepalived+Nginx实现高可用集群
- Keepalived之——Keepalived + Nginx 实现高可用 Web 负载均衡