当前位置：文档之家› 搭建lvs+keepalived+mfs+nagios架构

搭建lvs+keepalived+mfs+nagios架构

撰写者：坏男孩 Blog: https://www.doczj.com/doc/6212292532.html,/

MSN:hahazhu0634@https://www.doczj.com/doc/6212292532.html, [前言]看过很多与架构有关的文章，但大部分都是其架构中一部分的某部分，经过很长一段时间思考，想将其一部分的一部分的知识整合到一起形成一个“看似比较大点的部分” ，也就有了我这篇文章。当然大家也可以说我这篇文章也是架构的一部分的某部分，但我觉得它还是值得大家参考学习的，最起码我是这样想的。这篇文章可能会一直处与有待修整状态，因为它不是最完美的，会随着我的测试而更新，但直到我的这台测试机下线为止，可能将无法再提供更新! 本篇以四大部分组成，每一大部分还可能有若干个小块组成。四大部分分别为： (I)前端部分 (1) lvs+keepalived (2) RealServer (3) 测试基本的功能测试 (II)数据存储 (1) 主控服务器安装 (2) 元数据日志服务器安装 (3) 数据存储服务器安装 (4) 客户端安装 (5) 测试

基本功能测试 (III)监控部分 (1) Nagios 软件安装 (2) nagios 报警方式测试(email,短信,msn) (3)nagios 添加监控监控元素 (IV)总测试 (1)LVS 失败/切换 (2)LVS 故障隔离 (3)LVS 伸缩测试(简单来说就是增删真实服务器) (4)基

与域名的虚拟主机测试 (5)让 Backup 调度器也工作 (6)负载调度器只有一台，我该如何用 keepalived 呢？ (7)客户端挂载限制测试 (8)动态扩展添加服务

器或者磁盘 (9)垃圾回收机制 (10)主控服务器恢复测试 (11)mfs 高可用测试

声明：本篇文章所有的内容都是经测试后而写的，并且不是针对新手而写的。测

(1)安装 Lvs+keepalived 1. 下载相关软件包

(2) #mkdir /soft

(3) #cd /soft #wget

https://www.doczj.com/doc/6212292532.html,/software/kernel-2.6/ipvsadm-1.2

4.tar.gz

(4) #wget

https://www.doczj.com/doc/6212292532.html,/software/keepalived-1.1.15.tar.gz

(5) 2. 安装 LVS 和 Keepalived

(6) #lsmod |grep ip_vs #uname -r 2.6.18-92.el5

(7) # ln -s /usr/src/kernels/2.6.18-92.el5-i686/* /usr/src/linux/

(8) #tar zxvf ipvsadm-1.24.tar.gz

(9) #cd ipvsadm-1.24

(10) #make && make install #find / -name ipvsadm

(11) # 查看 ipvsadm 的位置

(12) #tar zxvf keepalived-1.1.15.tar.gz

(13) #cd keepalived-1.1.15

(14) #./configure && make && make install

(15) #find / -name keepalived

(16)# 查看 keepalived 位置

(17)#cp /usr/local/etc/rc.d/init.d/keepalived /etc/rc.d/init.d/

(18) #cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/

(19) #mkdir /etc/keepalived

(20) #cp /usr/local/etc/keepalived/keepalived.conf /etc/keepalived/

(21)#cp /usr/local/sbin/keepalived /usr/sbin/

(22) # chkconfig --level 345 keepalived on

(23) #service keepalived start|stop #做成系统启动服务方便管理.

(24) 注意：以上两步在 master/backup 一样执行!

(25) *修改 Master 的/etc/keepalived/keepalived.conf*

! Configuration File for keepalived 2 global_defs { router_id LVS_BADBOY1 } # VIP1 vrrp_sync_group VDR { group { VI_cache } } vrrp_instance VI_cache { state MASTER interface eth0 lvs_sync_daemon_inteface eth0

virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass 1111 } virtual_ipaddress { 172.11.1.61 } } virtual_server 172.11.1.61 80 { delay_loop 6 lb_algo rr lb_kind DR # persistence_timeout 60 protocol TCP real_server 172.11.1.57 80 { TCP_CHECK { connect_timeout 10 nb_get_retry 3 delay_before_retry 3 connect_port 80 } } real_server 172.11.1.58 80 { TCP_CHECK { connect_timeout 10 3 nb_get_retry 3

delay_before_retry 3 connect_port 80 } } } *修改 Backup 的

/etc/keepalived/keepalived.conf* ! Configuration File for keepalived global_defs { router_id LVS_BADBOY2 } # VIP1 vrrp_sync_group VDR { group { VI_cache } } vrrp_instance VI_cache { state BACKUP interface eth0

lvs_sync_daemon_inteface eth0 virtual_router_id 51 priority 90

advert_int 1 authentication { auth_type PASS auth_pass 1111 }

virtual_ipaddress { 172.11.1.61 } } virtual_server 172.11.1.61 80

{ delay_loop 6 lb_algo rr lb_kind DR #persistence_timeout 60 protocol TCP real_server 172.11.1.57 80 { 4 #weight 3 TCP_CHECK { connect_timeout 10 nb_get_retry 3 delay_before_retry 3 connect_port 80 } } real_server 172.11.1.58 80 { #weight 3 TCP_CHECK { connect_timeout 10 nb_get_retry 3 delay_before_retry 3 connect_port 80 } } }

1. apache 安装(最简单方式安装，仅仅为了整体的测试) #yum -y install httpd

2. 两台只需执行 lvs_rs.sh start 脚本便可(这样的脚本网上到处都有，我也是 COPY 的，哈哈!) #mkdir -p /scripts/shell #more

/scripts/shell/lvs_rs.sh #!/bin/bash #description : start realserver

VIP=172.11.1.61 /etc/rc.d/init.d/functions case "$1" in start) echo " start LVS of REALServer" /sbin/ifconfig lo:0 $VIP broadcast $VIP netmask 255.255.255.255 up echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce echo

"1" >/proc/sys/net/ipv4/conf/all/arp_ignore echo

"2" >/proc/sys/net/ipv4/conf/all/arp_announce ;; stop) 5 /sbin/ifconfig lo:0 down echo "close LVS Directorserver" echo

"0" >/proc/sys/net/ipv4/conf/lo/arp_ignore echo

"0" >/proc/sys/net/ipv4/conf/lo/arp_announce echo

"0" >/proc/sys/net/ipv4/conf/all/arp_ignore echo

"0" >/proc/sys/net/ipv4/conf/all/arp_announce ;; *) echo "Usage: $0 {start|stop}" exit 1 esac

(2)测试(以下是基本的功能测试!) 启动相应的服务

(A)Master 端:

#service keepalived stat 通过以下三种方法来检查启动是否正常

第一种方法： # ps aux|grep keepalived|grep -v "grep" root 3265 0.0 0.0 4824 620 ? -D root 3267 0.0 0.1 4868 1368 ? -D root 3268 0.0 0.0 4868 988 ? –D Ss S S 15:19 15:19 15:19 0:00 keepalived 0:00 keepalived 0:00 keepalived 如果没有这三个进程，则表示有问题，需要检查!

第二种方法： # ip addr list 1: lo: mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 54:52:00:60:4b:2a brd

ff:ff:ff:ff:ff:ff inet 172.11.1.63/24 brd 172.11.1.255 scope global eth0 inet 172.11.1.61/32 scope global eth0 inet6 fe80::5652:ff:fe60:4b2a/64 scope link valid_lft forever preferred_lft forever 3: sit0: mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 如果没有这个 VIP 表示有问题，需要检查!

第三种方法： 6 #more /var/log/messages Aug 19 15:22:27 centos-server Keepalived: Starting Keepalived v1.1.15 (08/09,2010) Aug 19 15:22:27 centos-server Keepalived_healthcheckers: Using MII-BMSR NIC polling thread... Aug 19 15:22:27 centos-server Keepalived_healthcheckers: Netlink reflector reports IP 172.11.1.63 added Aug 19 15:22:27

centos-server Keepalived_healthcheckers: Registering Kernel netlink reflector Aug 19 15:22:27 centos-server Keepalived_healthcheckers: Registering Kernel netlink command channel Aug 19 15:22:27 centos-server Keepalived: Starting Healthcheck child process, pid=3306 Aug 19 15:22:27 centos-server Keepalived: Starting VRRP child process, pid=3308 Aug 19 15:22:27 centos-server Keepalived_healthcheckers: Opening file

'/etc/keepalived/keepalived.conf'. Aug 19 15:22:27 centos-server Keepalived_vrrp: Using MII-BMSR NIC polling thread... Aug 19 15:22:27 centos-server Keepalived_healthcheckers: Configuration is using : 10657

Bytes Aug 19 15:22:27 centos-server Keepalived_vrrp: Netlink reflector reports IP 172.11.1.63 added Aug 19 15:22:27 centos-server

Keepalived_vrrp: Registering Kernel netlink reflector Aug 19 15:22:27 centos-server Keepalived_vrrp: Registering Kernel netlink command channel Aug 19 15:22:27 centos-server Keepalived_vrrp: Registering gratutious ARP shared channel Aug 19 15:22:27 centos-server Keepalived_healthcheckers: Activating healtchecker for service [172.11.1.57:80] Aug 19 15:22:27 centos-server Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'. Aug 19 15:22:27 centos-server Keepalived_healthcheckers: Activating healtchecker for service [172.11.1.58:80] Aug 19 15:22:27 centos-server Keepalived_vrrp: Configuration is using : 37201 Bytes Aug 19 15:22:27 centos-server Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(8,9)] Aug 19 15:22:28 centos-server Keepalived_vrrp: VRRP_Instance(VI_cache) Transition to MASTER STATE Aug 19 15:22:29 centos-server Keepalived_vrrp: VRRP_Instance(VI_cache) Entering MASTER STATE Aug 19 15:22:29

centos-server Keepalived_vrrp: VRRP_Instance(VI_cache) setting protocol VIPs. 7 Aug 19 15:22:29 centos-server Keepalived_vrrp:

VRRP_Instance(VI_cache) Sending gratuitous ARPs on eth0 for 172.11.1.61 Aug 19 15:22:29 centos-server avahi-daemon[1843]: Registering new address record for 172.11.1.61 on eth0. Aug 19 15:22:29 centos-server Keepalived_vrrp: VRRP_Group(VDR) Syncing instances to MASTER state Aug 19 15:22:29 centos-server Keepalived_vrrp: Netlink reflector reports IP 172.11.1.61 added Aug 19 15:22:29 centos-server

Keepalived_healthcheckers: Netlink reflector reports IP 172.11.1.61 added 在没有错误发生的情况下，并且已经进入 Master 状态!

(B)Backup 端：

#service keepalived start 通过以下二种方法来检查启动是否正常第一种方法： # ps aux|grep keepalived|grep -v "grep" root 2228 0.0 0.0 4824 612 ? -D root 2230 0.0 0.1 4868 1360 ? -D root 2231 0.0 0.0 4868 980 ? –D Ss S S 15:28 15:28 15:28 0:00 keepalived 0:00 keepalived 0:00 keepalived 第二种方法： #more /var/log/messages Aug 19 15:28:25 centos-server Keepalived: Starting Keepalived v1.1.15 (08/09,2010) Aug 19 15:28:25 centos-server Keepalived: Starting Healthcheck child process, pid=2230 Aug 19 15:28:25 centos-server Keepalived: Starting VRRP child process, pid=2231 Aug 19 15:28:25 centos-server Keepalived_healthcheckers: Using MII-BMSR NIC polling thread... Aug 19 15:28:25 centos-server Keepalived_healthcheckers: Netlink reflector reports IP 172.11.1.62 added Aug 19 15:28:25 centos-server Keepalived_healthcheckers: Registering Kernel netlink reflector Aug 19 15:28:25 centos-server Keepalived_healthcheckers: Registering Kernel netlink command channel Aug 19 15:28:25 centos-server Keepalived_vrrp: Using MII-BMSR NIC polling thread... Aug 19 15:28:25 centos-server Keepalived_vrrp: Netlink reflector reports IP 172.11.1.62 added Aug 19 15:28:25 centos-server

Keepalived_vrrp: Registering Kernel netlink 8 reflector Aug 19 15:28:25 centos-server Keepalived_vrrp: Registering Kernel netlink command channel Aug 19 15:28:25 centos-server Keepalived_vrrp: Registering gratutious ARP shared channel Aug 19 15:28:25 centos-server Keepalived_healthcheckers: Opening file

'/etc/keepalived/keepalived.conf'. Aug 19 15:28:25 centos-server Keepalived_healthcheckers: Configuration is using : 10655 Bytes Aug 19 15:28:25 centos-server Keepalived_vrrp: Opening file

'/etc/keepalived/keepalived.conf'. Aug 19 15:28:25 centos-server Keepalived_healthcheckers: Activating healtchecker for service [172.11.1.57:80] Aug 19 15:28:25 centos-server Keepalived_vrrp: Configuration is using : 37199 Bytes Aug 19 15:28:25 centos-server Keepalived_healthcheckers: Activating healtchecker for service [172.11.1.58:80] Aug 19 15:28:25 centos-server Keepalived_vrrp: VRRP_Instance(VI_cache) Entering BACKUP STATE Aug 19 15:28:25

centos-server Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(8,9)] 在没有错误发生的情况下，并且已经进入 Backup 状态!

注意，在这里为什么没有 ip addr list 了呢？原因：因为这是 Backup 端，只有 Master 端 Down 掉后，ip addr list 才能显示 VIP，切记!

(C)RealServer 端

#/scripts/shell/lvs_real start 通过以下方式检查 RS 端启动是否正常

#ip addr list 1: lo: mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet 172.11.1.61/32 brd 172.11.1.61 scope global lo:0 inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:16:36:56:64:da brd ff:ff:ff:ff:ff:ff inet 172.11.1.57/24 brd 172.11.1.255 scope global eth0 inet6 fe80::216:36ff:fe56:64da/64 scope link valid_lft forever preferred_lft forever 3: sit0: mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 9 如果看不到这个 vip,需要检查（一般 realserver 端不会出什么问题的） ! 修改显示网页，以帮助提示自己! Edit index.html on web1(172.11.1.57) #more

/var/www/html/index.html This is 172.11.1.57 Edit index.html on

web1(172.11.1.58) #more /var/www/html/index.html This is 172.11.1.58 #/usr/sbin/apachectl start 检查: #netstat -an|grep 80 tcp 0 0 :::80 LISTEN

注意：关闭 master,backup,www 防火墙 service iptables stop 原因：将出现问题时能将定位范围降低点。现在属与完事具备，只欠看结果了，哈哈。

(1)http://172.11.1.57《=是否显示 This is 172.11.1.57 呢？

(2)http://172.11.1.58《=是否显示 This is 172.11.1.58 呢？

(3)http://172.11.1.61 《=在进行 f5 刷新时，是否轮流显示 (1),(2)两步的结果呢？在第(3)步，还可以在 Master 端查看 # watch "ipvsadm -L -n" Every 2.0s: ipvsadm -L -n Thu Aug 19 15:53:17 2010 IP Virtual Server version

1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP

172.11.1.61:80 rr -> 172.11.1.58:80 Route 1 0 0 -> 172.11.1.57:80 Route 1 0 0 :::*

注意，我在配置 keepalived 用的是 rr 调度算法，大家也可以换成其他的算法（常用的算法为：wlc,rr），在这里我只是为了让大家更好的观察结果!且因为用手刷新，有点慢，无法体现 rr 算法的效果，为此用 python 写了一个客户端， http_request.py

10 #!/usr/bin/env python import urllib import threading #导入模块 class MyThread(threading.Thread): def __init__(self):

threading.Thread.__init__(self) def run(self):

doc=urllib.urlopen("http://172.11.1.61/index.html").read() #读取文件print doc, if __name__=='__main__': for i in range(10000): obj=MyThread() #继承线程的子类 obj.start() #启动线程 obj.join() #等待线程终止 10000 个请求过去后，就能看到两台 realserver 分别获得了 5000 个请求了。 python http_request.py 输出结果： ?????? This is 172.11.1.58 This is 172.11.1.57 This is 172.11.1.58 This is 172.11.1.57 This is 172.11.1.58 This is 172.11.1.57 This is 172.11.1.58 This is 172.11.1.57 This is 172.11.1.58 This is 172.11.1.57 This is 172.11.1.58 This is 172.11.1.57 This is 172.11.1.58 This is 172.11.1.57 This is 172.11.1.58 ?????? 看 master 端watch ”ipvsadm -L -n”

最后的结果: Every 2.0s: ipvsadm -L –n

Thu Aug 19 16:10:25 2010 IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.11.1.61:80 rr -> 172.11.1.58:80 Route 1 0 5000 -> 172.11.1.57:80 Route 1 0 5000

前面，一系列的测试都是在无防火墙的环境下做的，我们现在把防火墙加进去，再看一下!

虽然 sery 老应师提倡调度器不提倡用 iptables(原因是不想给调度器增加额外的负担，人家够累的了!)，但我这里是因为简单的小型架构，在这里还是将安全考略一下吧，当然大家可以针对自己不同的情况将其视为有也可视为无，但安全还是需要关注一下的。

more /scripts/shell/start_firewall.sh #!/bin/bash echo "Start Firewall" /sbin/iptables -F /sbin/depmod -a /sbin/modprobe ip_tables /sbin/modprobe ip_conntrack /sbin/iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT /sbin/iptables -A INPUT -i lo -j ACCEPT /sbin/iptables -P INPUT DROP /sbin/iptables -A INPUT -s 172.11.1.50 -p tcp --dport 22 -j ACCEPT /sbin/iptables -A INPUT -p tcp --dport 80 -j ACCEPT /sbin/iptables -A INPUT -s 172.11.1.62 -j ACCEPT(此处为

Master/Backup 互信 IP) echo "Start Success!"

Master/Backup 端启动以上脚本，再测试 http://172.11.1.61《=在进行 f5 刷新时，是否轮流显示 (1),(2)两步的结果呢？如果不正常，请修正防火墙规则！再来看下 Realserver 的防火墙

more /scripts/shell/start_firewall.sh #!/bin/bash echo "Start Firewall" /sbin/iptables -F /sbin/depmod -a /sbin/modprobe ip_tables 12 /sbin/modprobe /sbin/iptables /sbin/iptables /sbin/iptables

/sbin/iptables /sbin/iptables ip_conntrack -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -i lo -j ACCEPT –P INPUT DROP -A INPUT -s 172.11.1.50 -p tcp --dport 22 -j ACCEPT -A INPUT -p tcp --dport 80 -j ACCEPT Realserver 端启动以上脚本，再测试! (1)http://172.11.1.57《=是否显示 This is 172.11.1.57 呢？ (2)http://172.11.1.58《=是否显示This is 172.11.1.58 呢？ (3)http://172.11.1.61 《=在进行 f5 刷新时，是否轮流显示 (1),(2)两步的结果呢？

在此我们已经将第一大部分做完了，并且做完了“基本功能测试” ，对与更强的功能测试，如失败/切换,故障隔离等，我们会在第四大部分“总测试”中体现!

大家现在 Happy 吗？(噢，别急着 Happy,忘记告诉你们了，把前面的防火墙脚本加个 X 权限，并且放到/etc/rc.local 下哟!)

(1)主控服务器安装

1. 下载相关软件包

#mkdir /soft

#cd /soft

#wget

http://pro.hit.gemius.pl/hitredir/id=nArgjQgdG2UjKQPqMyeQfMV2Ld9NnqbE 9V_o2 mU7ODv.17/url=https://www.doczj.com/doc/6212292532.html,/tl_files/mfscode/mfs-1.6.15.tar.gz #groupadd mfs #useradd -g mfs mfs

#tar zxvf mfs-1.6.15.tar.gz

#cd mfs-1.6.15

# ./configure --prefix=/usr/local/mfs --with-default-user=mfs \

--with-default-group=mfs --disable-mfschunkserver --disable-mfsmount #make;make install

#cp /usr/local/mfs/etc/mfsmaster.cfg.dist

/usr/local/mfs/etc/mfsmaster.cfg

#cp /usr/local/mfs/etc/mfsexports.cfg.dist

/usr/local/mfs/etc/mfsexports.cfg

#chown –R mfs:mfs /usr/local/mfs/var/mfs

#cp /usr/local/mfs/var/mfs/metadata.mfs.empty

/usr/local/mfs/var/mfs/metadata.mfs

(2)元数据日志服务器安装

下载相关软件包

#mkdir /soft

#cd /soft

#wget

http://pro.hit.gemius.pl/hitredir/id=nArgjQgdG2UjKQPqMyeQfMV2Ld9N nqbE9V_o2mU7ODv.17/url=https://www.doczj.com/doc/6212292532.html,/tl_files/mfscode/mfs-1.6.15.ta

r.gz

#groupadd mfs #useradd -g mfs mfs

#tar zxvf mfs-1.6.15.tar.gz

#cd mfs-1.6.15

./configure --prefix=/usr/local/mfs --with-default-user=mfs \

--with-default-group=mfs --disable-mfschunkserver --disable-mfsmount #make;make install

#cp /usr/local/mfs/etc/mfs/mfsmetalogger.cfg.dist

/usr/local/mfs/etc/mfs/mfsmetalogger.cfg

#vi /etc/hosts 14 172.11.1.44 master

(3)数据存储服务器安装

下载相关软件包

#mkdir /soft

#cd /soft

#wget

http://pro.hit.gemius.pl/hitredir/id=nArgjQgdG2UjKQPqMyeQfMV2Ld9N nqbE9V_o2mU7ODv.17/url=https://www.doczj.com/doc/6212292532.html,/tl_files/mfscode/mfs-1.6.15.ta

r.gz

#groupadd mfs

#useradd -g mfs mfs

#tar zxvf mfs-1.6.15.tar.gz

#cd mfs-1.6.15

# ./configure --prefix=/usr/local/mfs --with-default-user=mfs \

--with-default-group=mfs --disable-mfsmaster

#make;make install

#cp /usr/local/mfs/etc/mfschunkserver.cfg.dist /usr/local/mfs/etc/ mfschunkserver.cfg

#cp /usr/local/mfs/etc/mfshdd.cfg.dist /usr/local/mfs/etc/mfshdd.cfg 在这里我用 kvm 添加了一块 10G 独立的磁盘,并且将其加载到/data 上，

#mount /dev/hdb /data

#chown -R mfs:mfs /data

(1)修改 mfshdd.cfg

#more /usr/local/mfs/etc/mfshdd.cfg /data

(2)修改 mfschunkserver.cfg

# more /usr/local/mfs/etc/mfschunkserver.cfg|grep MASTER_HOST MASTER_HOST = 172.11.1.44 <=只需要将其改指向到主控服务器 IP 即可

(4)客户端安装

(I)为了挂接基于 MooseFS 分布式文件，客户端主机必须安装 FUSE 软件包（ fuse 版本号至少 2.6，推荐使用版本号大于 2.7.2 的 fuse）。如果系统没有安装 fuse,你必须手动对其进行安装。一种常见的安装方式是从源码进行编译安装-我们可以从 https://www.doczj.com/doc/6212292532.html,/projects/fuse/取得安装源码：#cd /usr/src

#tar -zxvf fuse-2.7.2.tar.gz

#cd fuse-2.7.2 #./configure 15

#make

#make install

(*)注意，我原先用的是 2.8.4 版本，但加载 fuse 时，总是报找不到模块，我google 了一上，有很多人和我一样，解决方法也就是别人说的，选择低一点的版本便可以解决这个问题。

(II)client 安装

1、修改环境变量文件/etc/profile ,追加下面的行，然后再执行命令 source /etc/profile 使修改生效。如果不执行这个操作，在后面安装 MFS 的过程中，执行命令 ./configure --enable-mfsmount 时可能出现"checking for FUSE... no configure: error: mfsmount build was forced, but fuse development package is not installed"这样的错误，而不能正确安装 MFS 客户端程序。

2、解包 tar zxvf mfs-1.6.11.tar.gz

3、切换目录 cd mfs-1.6.11

4、创建用户 useradd mfs –s /sbin/nologin

5、./configure --prefix=/usr/local/mfs --with-default-user=mfs

--with-default-group=mfs --disable-mfsmaster --disable-mfschunkserver 6、编译安装 make ; make install

(3)基本操作

前提：以上三个角色安装没有出现任何异常的情况下，可以往下继续进行…如没有正常，需在操作前解决安装出现的问题，在这里不能提供安装异常的解决方法了，因为，本人在安装的过程中，出现异常的问题也很少!不过，大家可以补上…

(I)启动主控制服务器 /usr/local/mfs/sbin/mfsmaster start

/usr/local/mfs/sbin/mfscgiserv start

检查：

检查进程 # ps aux|grep mfsmaster|grep -v "grep"

mfs 2005 0.0 3.3 42532 34572 ? S< 02:09 0:00 mfsmaster start

检查端口连接 #netstat -an|more

Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:9419 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:9420 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:9421 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:9425 0.0.0.0:* LISTEN 16

在这里，我将补充些课外知识(mfsmaster.cfg)： MATOML_LISTEN_PORT = 9419 metalogger 监听的端口地址(默认是 9419)； MATOCS_LISTEN_PORT = 9420 用于 chunkserver 连接的端口地址（默认是 9420）； MATOCU_LISTEN_PORT = 9421 用于客户端挂接连接的端口地址（默认是 9421）；可能有人会问,那 9425 端口又是什么呢？问的好呀，反映出你执行了 mfscgiserv start 并没有看输出的结果哟!我再给你显示一遍!

#/usr/local/mfs/sbin/mfscgiserv start starting simple cgi server (host: any , port: 9425 , rootpath: /usr/share/mfscgi)

看到了吗？

Mfscgiserv 是用 python 编写的一个 web 服务器，它的监听端口是 9425 ，可以利用： /usr/local/mfs/sbin/mfscgiserv 来启动，用户利用浏览器就可全面监控所有客户挂接， chunkserver 及 master server，客户端的各种操作等等，绝对是个好工具。在任何一台装有浏览器的机器上都可以查看：http://主控制服务器 ip：9425 好了，那么我们用 web 方式观察一番

http://172.11.1.44:9425

因数据存储服务器还没有启动，所以在 Servers 这个标签下，还没有发现任何服务器。好了，有了这个工具大家下面的操作更容易点了。停止控制服务器：安

全停止 master server 是非常必要的，最好不要用 kill。利用 mfsmaster –s 来安全停止 master serve，一旦是用了 kill 也是有解决方法的，看下面的总测试中的”用 kill 停止 master 了，我应该怎么办？”

(II)启动元数据日志服务器

/usr/local/mfs/sbin/mfsmetalogger start

检查:

检查进程

# ps aux|grep mfs|grep -v "grep"

mfs 15707 0.0 0.0 1924 608 ? S< 03:27 0:00

/usr/local/mfs/sbin/mfsmetalogger start

查看端口连接

# netstat -an|grep 9419

tcp 0 0 172.11.1.20:41627 172.11.1.44:9419 ESTABLISHED

检查主控制服务器日志

如果没有启动元数据日志服务器，那么在日志中会有如下提示： Sep 4 11:31:00 centos-mfs mfsmaster[2539]: no meta loggers connected !!!

通过 web 方式检查

(III)启动存储服务器

mount /dev/hdb /data /usr/local/mfs/sbin/mfschunkserver start

检查：

检查进程

# ps aux|grep mfschunkserver|grep -v grep mfs 2213 0.1 0.2 27536 2400 ? S

mfschunkserver start

检查端口连接

[root@centos-server ~]# netstat –an

Active Internet connections (servers and established) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 0.0.0.0:9422 0.0.0.0:* LISTEN tcp 0 0 172.11.1.22:47279 172.11.1.44:9420 ESTABLISHED

在这里有一个 9422 端口，这个监听端口用于与其它数据存储服务器间的连接，通常是数据复制。而且你还可以发现我这台存储服务器已经与主控制服务器有“勾搭”了(红色标识的)

用图来证明“勾搭”

好了，现在把所有的都启动起来吧!

停止数据存储服务器：

停止 mfschunkserver，利用 mfschunkserver–s 来安全停止 mfschunkserver。(IV)客户端加载

mkdir /data

modprobe fuse

/usr/local/mfs/bin/mfsmount /data -H 172.11.1.44

通过 df 命令查看磁盘使用情况来检查是否被挂接成功

#df -h

Filesystem Size Used Avail Use% Mounted on /dev/hda3 7.7G 1.4G 5.9G 20% / /dev/hda1 99M 11M 83M 12% /boot 18 tmpfs mfs#172.11.1.44:9421 506M 0 506M 0% /dev/shm 27G 832K 27G 1% /data

利用 mount 命令查看：

#mount

/dev/hda3 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620)

/dev/hda1 on /boot type ext3 (rw) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) mfs#172.11.1.44:9421 on /data type fuse (rw,nosuid,nodev,allow_other,default_permissions) 好了，我们尝试在这个目录下添加，删除，修改文件是否 OK。卸载已挂接的文件系统利用 Linux 系统的 umount 命令就可以了，例如： #umount /data 如果出现下列情况： # umount /data umount: /data: device is busy umount: /data: device is busy 则说明客户端本机有正在使用此文件系统，可以查明是什么命令正在使用，然后推出就可以了，最好不要强制退出。

注意，以上所有的操作均是在无防火墙开启的情况下操作的。

以下尝试加防火墙来操作添加并且启动主控制服务器防火墙

start_firewall.sh

#!/bin/bash echo "Start Firewall" /sbin/iptables -F /sbin/depmod -a /sbin/modprobe ip_tables /sbin/modprobe ip_conntrack /sbin/iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT /sbin/iptables -A INPUT -i lo -j ACCEPT /sbin/iptables -A INPUT -s 172.11.1.50 -p tcp --dport 22 -j ACCEPT /sbin/iptables -A INPUT -s 172.11.1.21 -p tcp --dport 9420 -j ACCEPT 19 /sbin/iptables /sbin/iptables /sbin/iptables /sbin/iptables /sbin/iptables /sbin/iptables /sbin/iptables /sbin/iptables -A -A -A -A -A -A -A -P INPUT INPUT INPUT INPUT INPUT INPUT INPUT INPUT -s 172.11.1.22 -s 172.11.1.23 -s 172.11.1.24 -s 172.11.1.57 -s 172.11.1.58 -s

172.11.1.50 -s 172.11.1.20 DROP -p -p -p -p -p -p -p tcp tcp tcp tcp tcp tcp tcp --dport --dport --dport --dport --dport --dport --dport 9420 9420 9420 9421 9421 9425 9419 -j -j -j -j -j -j -j ACCEPT ACCEPT ACCEPT ACCEPT ACCEPT ACCEPT ACCEPT

(9425 是提供一个 GUI 工具的端口，可以通过浏览器访问)

测试，看前面操作是否一切正常，如果正常继续!

添加并且启动服务器存储防火墙

start_firewall.sh

#!/bin/bash echo "Start Firewall" /sbin/iptables -F /sbin/depmod -a /sbin/modprobe ip_tables /sbin/modprobe ip_conntrack /sbin/iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT /sbin/iptables -A INPUT -i lo -j ACCEPT /sbin/iptables -A INPUT -s 172.11.1.50 -p tcp --dport 22 -j ACCEPT /sbin/iptables -A INPUT -s 172.11.1.23 -p tcp --dport 9422 -j ACCEPT /sbin/iptables -A INPUT -s 172.11.1.22 -p tcp --dport 9422 -j ACCEPT /sbin/iptables -A INPUT -s 172.11.1.24 -p tcp --dport 9422 -j ACCEPT /sbin/iptables -A INPUT -s 172.11.1.44 -j ACCEPT /sbin/iptables

-A INPUT -s 172.11.1.57 -j ACCEPT /sbin/iptables -A INPUT -s 172.11.1.58 -j ACCEPT /sbin/iptables -P INPUT DROP

在这里只列举了一台 mfs chunckserver 的防火墙，其他的只是在针对 9422 端口的源 IP 上有不同。

最后，我们将防火墙启动与服务启动追加到/etc/rc.local 下

监控端 nagios ，插件安装

(1)准备软件包在做安装之前确认要对该机器拥有 root 权限。确认你安装好的 linux 系统上已经安装如下软件包再继续。 Apache Gcc 编译器 GD 库与开发库可以用yum 命令来安装这些软件包，键入命令： yum –y install httpd gcc glibc glibc‐common gd gd‐devel

(2)操作过程建立nagios 账号 /usr/sbin/useradd nagios && passwd nagios 创建一个用户组名为nagcmd 用于从Web 接口执行外部命令。将nagios 用户和apache 用户都加到这个组中。 /usr/sbin/groupadd nagcmd

/usr/sbin/usermod ‐G nagcmd nagios /usr/sbin/usermod ‐G nagcmd

apache (3)下载nagios 和插件程序包,nrpe 包下载Nagios 和Nagios 插件的软件包(访问https://www.doczj.com/doc/6212292532.html,/download/) cd /soft wget

https://www.doczj.com/doc/6212292532.html,/sourceforge/nagios/nagios-3.0.6.ta r.gz wget

https://www.doczj.com/doc/6212292532.html,/sourceforge/nagios/nrpe-2.12.tar.g z wget

https://www.doczj.com/doc/6212292532.html,/sourceforge/nagiosplug/nagios-plug ins-1.4.15.tar.gz

(4)编译与安装nagios cd /soft tar zxvf nagios-3.0.6.tar.gz cd nagios‐3.0.6 ./configure ‐‐with‐command‐group=nagcmd ‐

‐prefix=/usr/local/nagios make all make install make install‐init make install‐config make install‐commandmode

验证程序是否被正确安装。切换目录到安装路径（这里是/usr/local/nagios ） ,看是否存在 etc 、 bin 、 sbin 、 share 、 var 这五个目录，如果存在则可以表

cd /soft

tar zxvf nagios-plugins-1.4.15.tar.gz

cd nagios‐plugins‐1.4.15

./configure ‐‐with‐nagios‐user=nagios

‐‐with‐nagios‐group=nagios ‐‐perfix=/usr/local/nagios

make && make install

验证： ls /usr/local/nagios/libexec 会显示安装的插件文件,即所有的插件都安装在libexec这个目录下

(5)编译与安装nrpe

tar ‐zxvf nrpe-2.12.tar.gz

cd nrpe‐2.12

./configure

make all

make install‐plugin

make install‐daemon

make install‐daemon‐config /usr/local/nagios/libexec/check_nrpe ‐H localhost

会返回当前NRPE的版本

# /usr/local/nagios/libexec/check_nrpe -H localhost NRPE v2.12

也就是在本地用check_nrpe连接nrpe daemon是正常的

(6)配置WEB接口

方法一：直接在安装nagios时make install‐webconf 创建一个monitor的用户用于Nagios的WEB接口登录。记下你所设置的登录口令，一会儿你会用到它。

/usr/bin/htpasswd ‐c /usr/local/nagios/etc/https://www.doczj.com/doc/6212292532.html,ers monitor 重启Apache服务以使设置生效。

service httpd restart

方法二：在 httpd.conf 最后添加如下内容：

ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin

"/usr/local/nagios/sbin"> Options ExecCGI AllowOverride None Order allow,deny Allow from all AuthName "Nagios Access" 22 AuthType Basic AuthUserFile /usr/local/nagios/etc/htpasswd //用于此目录访问身份验证的文件 Require valid-user Alias /nagios

/usr/local/nagios/share Options None AllowOverride None Order allow,deny Allow from all AuthName "Nagios Access" AuthType Basic AuthUserFile /usr/local/nagios/etc/htpasswd //用于此目录访问身份验证的文件 Require valid-user htpasswd ‐c /usr/local/nagios/etc/htpasswd monitor New password: (输入12345) Re‐type new password: (再输入一次密码) Adding password for user monitor

查看认证文件的内容 less /usr/local/nagios/etc/htpasswd

monitor:OmWGEsBnoGpIc前半部分是用户名monitor,后面是加密后的密码

(7)启动nagios 把Nagios加入到服务列表中以使之在系统启动时自动启动

chkconfig ‐‐add nagios

chkconfig nagios on

验证Nagios的样例配置文件

/usr/local/nagios/bi n/nagios ‐v /usr/local/nagios/etc/nagios.cfg 如果没有报错，可以启动Nagios服务

service nagios start

(8)更改SELinux设置

# more /etc/selinux/config

# This file controls the state of SELinux on the system. # SELINUX= can take one of these three values: # enforcing - SELinux security policy is enforced. # permissive - SELinux prints warnings instead of enforcing. # disabled - SELinux is fully disabled. SELINUX=disabled # SELINUXTYPE= type of policy in use. Possible values are: # targeted - Only targeted network daemons are protected. # strict - Full SELinux protection. SELINUXTYPE=targeted 修改后，需要重启

被监控端 nrpe 和插件安装

(1)添加用户 useradd nagios

(2)下载 nrpe 和插件程序包 cd /soft wget

https://www.doczj.com/doc/6212292532.html,/sourceforge/nagios/nrpe-2.12.tar.g z wget

https://www.doczj.com/doc/6212292532.html,/sourceforge/nagiosplug/nagios-plug ins-1.4.15.tar.gz

(3)安装nagios插件和nrpe tar ‐zxvf nagios‐plugins‐1.4.15.tar.gz cd nagios‐plugins‐1.4.15 ./configure make make install chown

nagios.nagios /usr/local/nagios chown ‐R nagios.nagios

/usr/local/nagios/libexec tar ‐zxvf nrpe-2.12.tar.gz cd

nrpe‐2.12 ./configure --enable-command-args make all make

install‐plugin make install‐daemon make install‐daemon‐config

/usr/local/nagios/libexec/check_nrpe ‐H loca lhost 会返回当前NRPE的版本 # /usr/local/nagios/libexec/check_nrpe -H localhost NRPE v2.12

也就是在本地用 check_nrpe 连接 nrpe daemon 是正常的

注:为了后面工作的顺利进行,注意本地防火墙要打开 5666 能让外部的监控机访问 /usr/local/nagios/libexec/check_nrpe –h查看这个命令的用法可以看到用法是check_nrpe –H 被监控的主机‐c 要执行的监控命令

注意:‐c后面接的监控命令必须是nrpe.cfg文件中定义的.也就是NRPE daemon 只运行nrpe.cfg 中所定义的命令在被监控机上启动 NRPE：

/usr/local/nagios/bin/nrpe ‐c /usr/local/nagios/etc/nrpe.cfg ‐d (2)启动相应服务

监控端服务启动 (1)修改 nagios 配置文件修改

/usr/local/nagios/etc/cgi.cfg 此文件的如下行：

default_user_name=monitor authorized_for_system_information= monitor

authorized_for_configuration_information= monitor

authorized_for_system_commands=monitor

authorized_for_all_services=monitor authorized_for_all_hosts=monitor authorized_for_all_service_commands=monitor

authorized_for_all_host_commands=monitor

注意，因我添加的认证用户为：monitor。

(2)注释/usr/local/nagios/etc/nagios.cfg 此文件的如下行：

#cfg_file=/usr/local/nagios/etc/objects/commands.cfg

#cfg_file=/usr/local/nagios/etc/objects/contacts.cfg

#cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg

#cfg_file=/usr/local/nagios/etc/objects/templates.cfg

#cfg_file=/usr/local/nagios/etc/objects/localhost.cfg

添加如下行：

cfg_dir=/usr/local/nagios/etc/mfs

创建，拷贝动作

[root@nagios etc]#mkdir /usr/local/nagios/etc/mfs

[root@nagios etc]# cp objects/commands.cfg mfs/

[root@nagios etc]# cp objects/timeperiods.cfg mfs/

[root@nagios etc]# cp objects/contacts.cfg mfs/

[root@nagios etc]# cp objects/localhost.cfg mfs/mfs_cluster.cfg

[root@nagios etc]# cp objects/templates.cfg mfs/

(2)检验,启动 nagios 并且测试

[root@nagios ~]# /usr/local/nagios/bin/nagios -v

/usr/local/nagios/etc/nagios.cfg Nagios 3.0.6 Copyright (c) 1999-2008 Ethan Galstad (https://www.doczj.com/doc/6212292532.html,) Last Modified: 12-01-2008 License: GPL Reading configuration data... Running pre-flight check on configuration data... Checking services... Checked 8 services. Checking hosts... Checked 1 hosts. Checking host groups... Checked 1 host groups. Checking service groups... 25 Checked 0 service groups. Checking contacts... Checked 1 contacts. Checking contact groups... Checked 1 contact groups. Checking service escalations... Checked 0 service escalations. Checking service dependencies... Checked 0 service dependencies. Checking host escalations... Checked 0 host escalations. Checking host dependencies... Checked 0 host dependencies. Checking commands... Checked 24 commands. Checking time periods... Checked 5 time periods. Checking for circular paths between hosts... Checking for circular host and service dependencies... Checking global event handlers... Checking obsessive compulsive processor commands... Checking misc settings... Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check 启动

[root@nagios ~]# service nagios start Starting nagios: done.

通过以下方式检查：

第一种方式：

# ps aux|grep nagios|grep -v grep avahi 1823 0.0 0.1 2660 1348 ? Ss Sep13 0:00 avahi-daemon: running [nagios.local] nagios 3042 0.0 0.1 12980 1244 ? Ssl Sep13 0:15 /usr/local/nagios/bin/nagios -d

/usr/local/nagios/etc/nagios.cfg nagios 5316 0.5 0.0 13544 900 ? Ssl 15:46 0:00 /usr/local/nagios/bin/nagios -d

/usr/local/nagios/etc/nagios.cfg

第二种方式 http://172.11.1.19/nagios，登录，输入认证信息如果能正常浏览，就说明 nagios 启动没问题了。

被监控端服务启动

修改 nrpe

# ALLOWED HOST ADDRESSES

# This is an optional comma-delimited list of IP address or hostnames # that are allowed to talk to the NRPE daemon.

# # Note: The daemon only does rudimentary checking of the client's IP # address. I would highly recommend adding entries in your

/etc/hosts.allow # file to allow only the specified host to connect to the port

# you are running this daemon on.

# NOTE: This option is ignored if NRPE is running under either inetd or xinetd 27 allowed_hosts=127.0.0.1,172.11.1.19<=添加监控端 ip

(2)添加防火墙

/sbin/iptables -A INPUT -s 172.11.1.19 -p tcp --dport 5666 -j ACCEPT 通过监控端测试：

# /usr/local/nagios/libexec/check_nrpe -H 172.11.1.57 NRPE v2.12 (3)启动 nrpe 并且测试

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d more /var/log/messages Sep 25 00:22:04 centos-server nrpe[23290]: Starting up daemon Sep 25 00:22:04 centos-server nrpe[23290]: Listening for connections on port 5666 Sep 25 00:22:04 centos-server nrpe[23290]: Allowing connections from: 127.0.0.1,172.11.1.19

nagios 报警方式测试(email,sms,msn)

email 报警方式（在这里我们借用的是 sendmail 服务来发送邮件）

第一步，先检查发送端口是否存活: [root@nagios conf.d]# netstat -an|grep ':25\>' tcp 0 0 127.0.0.1:25 LISTEN

第二步，发送邮件测试:

[root@nagios conf.d]#mail -s "This is a test email" someone@https://www.doczj.com/doc/6212292532.html, hello,this is the content of mail. welcome to https://www.doczj.com/doc/6212292532.html, 第一行是输入的命令，表示邮件的主题，后面的 admin@https://www.doczj.com/doc/6212292532.html, 则是邮件的接收人， -s 输入完这行命令后回车，会进入邮件正文的编写，我们可以输入任何文字，比如上面的两行。当邮件正文输入完成后，需要按 CTRL+D 结束输入，此时会提示你输入 Cc 地址，即邮件抄送地址，没有直接回车就完成了邮件的发送。参考：https://www.doczj.com/doc/6212292532.html,/article/317.html

后面几步我会在添加元素里体现出来的。

请继续阅读…

sms 报警方式参考：https://www.doczj.com/doc/6212292532.html,/115934/115091

msn 报警方式前提：

(1)apache 需要 php,openssl,openssl-devel(我是在 nginx 环境做的测试) 下载 https://www.doczj.com/doc/6212292532.html,/dev/php/sendMsg.zip 修改

index.php，内容简化为：

include('sendMsg.php'); $sendMsg = new sendMsg();

$sendMsg->login('mweb@https://www.doczj.com/doc/6212292532.html,', 'PASSWORD');

$sendMsg->createSession('hahazhu0634@https://www.doczj.com/doc/6212292532.html,');

$sendMsg->sendMessage($_GET['message'], 'Times New Roman',

'FF0000'); ?> 28 0.0.0.0:* 测试： curl

http://nginx_ip/msn/index.php?message="This is test for msn" 结果图： ? 我想如何将其用到 nagios 报警上，想必大家都没问题。(仅供参考) nagios 添加监控监控元素我们需要监控的元素有很多，如端口，磁盘利用率，CPU\MEM 利用率，进程等等角色 Keepalived-master Keepalived-Backup RealServer Mfs-Master Mfs-Backup Mfs-Chunk 监控元素端口(80),服务健康检查,进程进程磁盘空间利用率,端口,进程端口，进程监控端口，进程监控端口，进程监控

(1)端口监控借用 check_tcp

(2)服务健康检查在 Realserver 上创建一个监控文件，文件名为：

mfs_monitor.html touch /data/https://www.doczj.com/doc/6212292532.html,/mfs_monitor.html 我们主要写一个读这个文件的程序用来判定，总的服务架构是否健康。 curl -o /dev/null -s

-w %{http_code} http://172.11.1.61/mfs_monitor.html 通过这个返回 200 的状态码，就表示成功，否则后端的分布式文件系统有问题，需要检查。check_mfs_arch.sh

#!/usr/bin/env bash httpstatus=`/usr/bin/curl -o /dev/null -s

-w %{http_code} 29 http://172.11.1.61/mfs_monitor.html` if

[ x$httpstatus == "x200" ]; then echo "OK" exit 0 else echo "Please Check Mfs Architecture!" exit 2 fi

(3)进程监控可以参考 nrpe+check_own_procs.sh check_own_procs.sh(此脚本部署在被监控端上) #!/usr/bin/env bash keepalived_nums=`ps aux|grep -v 'grep'|grep 'keepalived -D'|wc -l` if [ x$keepalived_nums == 'x3' ]; then echo 'OK,3 keepalivd process!' exit 0 else echo 'Fail,check keepaplived Master!' exit 2 fi 修改被监控端 nrpe.cfg

command[check_process]=/usr/local/nagios/libexec/check_own_procs.sh 主监控端修改 command.cfg define command{ command_name command_line } check_keepalived_process $USER1$/check_nrpe –H $HOSTADDRESS$ -c $ARG1$ 再修改 mfs_cluster.cfg define service{ use mfs-service host_name mfs-keepalived-Master service_description Check Keepalived Processes check_command check_keepalived_process!check_process }

其他的做法一样，我只是说了些简单的监控添加，多与大批量的添加可以参考：https://www.doczj.com/doc/6212292532.html,/115934/116336

我只针对 Keepalived-Master 主机做了相应的监控，因是学习性的，我就不想全部都添加了，我相信大家会针对我提出的监控元素会提出更多，我也相信大家能将监控做得更好。我所做的一切都是抛砖引玉…

(IV)总测试

(1)LVS 失败/切换当我们的主负载均衡器(Master)不工作时，我们的

备份主负载均衡器(Backup)能自动将其请求分流的功能接替过来

吗？能做到实时吗？当 Master 恢复后，能再承担 Master 角色

吗？

答：

可以做到自动替换。

不能做到实时，因为他们彼此之间最起码还要有一个判定对方死活的功能，这个肯定是需要一个时间的，在这个时间范围内的请求将会出现问题，好在这个时间是很短的(接替时间由 keepalived.conf 文件的 advert_int 指

定 )。

当 master 恢复后，backup 也会自动将请主负载均衡器的角色再次归还与Master，原因是因为 keepalived.conf 的 priority 决定，所以主负载均衡器的 priority 值要大与备份负载均衡器的 priority 值

在确认主负载均衡器（MASTER）的keepalived 进程关闭后，我们来看看备份负载均衡器的运行情况。这里我们观察四个地方：。通过执行ip add 即可

看见vip 是否已经被绑定在备份负载均衡器的指定的网络接口和ipvsadm 的

输出及系统日志的输出,最后就是观察他们的业务是否还在正常提供服务。

# ip addr list

1: lo: mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 54:52:00:60:4b:2a brd ff:ff:ff:ff:ff:ff inet 172.11.1.62/24 brd 172.11.1.255 scope global eth0 inet6

fe80::5652:ff:fe60:4b2a/64 scope link valid_lft forever preferred_lft forever

3: sit0: mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 这是在没有替换主负载均衡器下的 ip addr list 信息，以下是做了替换后的信息:

#ip addr list

1: lo: mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: eth0: mtu 1500 qdisc pfifo_fast qlen 1000 31 link/ether 54:52:00:60:4b:2a brd ff:ff:ff:ff:ff:ff inet 172.11.1.62/24 brd 172.11.1.255 scope global eth0 inet 172.11.1.61/32 scope global eth0 inet6 fe80::5652:ff:fe60:4b2a/64 scope link valid_lft forever preferred_lft forever

3: sit0: mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0 # ipvsadm –L –n

IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn

这是在没有替换主负载均衡器下的 ipvsadm –L -n 信息，以下是做了替换后的信息: # ipvsadm -L -n IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 172.11.1.61:80 rr -> 172.11.1.58:80 Route 1 0 0 -> 172.11.1.57:80 Route 1 0 0

more /var/log/messages Sep 10 08:21:02 centos-server Keepalived: Starting Keepalived v1.1.15 (08/09,2010) Sep 10 08:21:02 centos-server Keepalived_healthcheckers: Using MII-BMSR NIC polling thread... Sep 10 08:21:02 centos-server Keepalived_healthcheckers: Netlink reflector reports IP 172.11.1.62 added Sep 10 08:21:02 centos-server

Keepalived_healthcheckers: Registering Kernel netlink reflector Sep 10 08:21:02 centos-server Keepalived_healthcheckers: Registering Kernel netlink command channel Sep 10 08:21:02 centos-server Keepalived: Starting Healthcheck child process, pid=2184 Sep 10 08:21:02

centos-server Keepalived_healthcheckers: Opening file

'/etc/keepalived/keepalived.conf'. Sep 10 08:21:02 centos-server Keepalived_healthcheckers: Configuration is using : 10655 Bytes Sep 10 08:21:02 centos-server Keepalived_vrrp: Using MII-BMSR NIC polling thread... Sep 10 08:21:02 centos-server Keepalived_vrrp: Netlink reflector reports IP 172.11.1.62 added Sep 10 08:21:02 centos-server Keepalived_vrrp: Registering Kernel netlink reflector Sep 10 08:21:02 centos-server Keepalived_vrrp: Registering Kernel netlink command channel Sep 10 08:21:02 centos-server Keepalived: Starting VRRP child process, pid=2186 Sep 10 08:21:02 centos-server Keepalived_vrrp: Registering gratutious ARP shared channel Sep 10 08:21:02 centos-server Keepalived_healthcheckers: Activating healtchecker for service

[172.11.1.57:80] Sep 10 08:21:02 centos-server Keepalived_vrrp: Opening file '/etc/keepalived/keepalived.conf'. Sep 10 08:21:02 centos-server Keepalived_healthcheckers: Activating healtchecker for service

[172.11.1.58:80] 32 Sep 10 08:21:02 centos-server Keepalived_vrrp: Configuration is using : 37199 Bytes Sep 10 08:21:02 centos-server Keepalived_vrrp: VRRP_Instance(VI_cache) Entering BACKUP STATE Sep 10 08:21:02 centos-server Keepalived_vrrp: VRRP sockpool: [ifindex(2), proto(112), fd(8,9)]

这是在没有替换主负载均衡器下的系统日志信息，以下是做了替换后的信息: Sep 10 08:21:02 centos-server Keepalived: Starting Keepalived v1.1.15 (08/09,2010) Sep 10 08:21:02 centos-server Keepalived_healthcheckers: Using MII-BMSR NIC polling thread... Sep 10 08:21:02 centos-server Keepalived_healthcheckers: Netlink reflector reports IP 172.11.1.62 added Sep 10 08:21:02 centos-server Keepalived_healthcheckers: Registering Kernel netlink reflector Sep 10 08:21:02 centos-server Keepalived_healthcheckers: Registering Kernel netlink command channel Sep 10 08:21:02 centos-server Keepalived: Starting Healthcheck child process, pid=2184 Sep 10 08:21:02 centos-server