故障表现

哨兵只存在两个的时候,当哨兵模式的redis主节点挂掉以后,业务组件不能切换到新主节点

故障原因

redis哨兵依旧认为旧主为主节点,没有触发failover

故障原因定位

哨兵集群部署方式:

1主1从,3哨兵

 

 

 

哨兵初始配置项:

 

 

 

这里主要关注sentinel monitor投票数为2
代表必须有两个哨兵都认为主节点不在线了,即两个主节点认为主观下线(即sdown)才会触发客观下线(即odown)

解决过程

1.后台指定日志文件启动哨兵 /usr/bin/redis-sentinel /etc/redis-sentinel.conf > /data/thirdAssembly/redis/log/17000/17000.log>&1 & ,查看日志

 

 

 

2.发现日志显示不能正确判断139机器的哨兵状态

telnet 139的哨兵端口,发现报错信息如下:

-DENIED Redis is running in protected mode because protected mode is enabled, no bind address was specified, no authentication password is requested to clients. In this mode connections are only accepted from the loopback interface. If you want to connect from external computers to Redis you may adopt one of the following solutions: 1) Just disable protected mode sending the command 'CONFIG SET protected-mode no' from the loopback interface by connecting to Redis from the same host the server is running, however MAKE SURE Redis is not publicly accessible from internet if you do so. Use CONFIG REWRITE to make this change permanent. 2) Alternatively you can just disable the protected mode by editing the Redis configuration file, and setting the protected mode option to 'no', and then restarting the server. 3) If you started the server manually just for testing, restart it with the '--protected-mode no' option. 4) Setup a bind address or an authentication password. NOTE: You only need to do one of the above things in order for the server to start accepting connections from the outside.
Connection closed by foreign host.
复制代码

分析:

这个表示139开了保护模式,当开启保护模式的时候默认只能本机连

原因定位:

设置的2个以上哨兵主观认为master宕机,但是现在的情况是挂掉一个,另外两个有一个开启了保护模式不能获取它的主观状态,所以不能触发主从切换

故障恢复

在139这台机器的哨兵配置项上加上

protected-mode no
复制代码

再次尝试,OK了