概述
- 哨兵Sentinel:主从故障切换的具体实现
- 分布式
- 也是Redis服务器,不提供数据服务
- 一般为单数
三个阶段
- 监控:同步信息
- 通知:保持联通
- 故障转移
- 发现问题
- 竞选负责人
- 优选新master
- 新master上任,其他slave切换master,原master作为slave故障回复后连接
监控阶段
用于同步所有节点-Sentinels的状态信息,并获取master、slave的信息
- 获取各个sentinel的状态(是否在线),新的sentinel上线会刷新所有sentinel的SentinelState:sentinels
- 获取master的状态
- master属性
- runid
- role:master
- 各slave的详细信息
- 获取所有slave的状态(根据master中的slave信息)
- slave属性
- runid
- role:slave
- master_host、master_port
- offset ……
通知阶段
- sentinel监控master、slave状态
- sentinel之间同步
- publish/subscribe
故障转移阶段
1. 监控master down
- 单台发现master down,改变SentinelRedisInstance:master:flags :SRI_S_DOWN,称为主观下线
- 其他sentinel继续发hello询问master状态,如果超过半数发现master down,改变SentinelRedisInstance:master:flags :SRI_O_DOWN,称为客观下线
2.选取sentinel执行者
- 投票
- 每个sentinel广播发送ip、port、选举次数和runid
- 接收者投票给最先收到的
- 可以多轮,直到选出
3.sentinel执行者挑选备选master
- 服务器列表中挑选备选master
- 在线的
- 响应慢的
- 与原master断开时间久的
- 优先原则
- 优先级
- offset
- runid
- 发送指令( sentinel –>slave)
- 向新的master发送slaveof no one
- 向其他slave发送slaveof 新masterIP端口
Lab
环境设置
Role | 参数 | 端口 |
---|---|---|
master | master:6380 | 6380 |
slave 1 | slave1:6379 | 6379 |
slave 2 | slave2:6379 | 6381 |
sentinel 1 | sentinel01:26379 | 26379 |
sentinel 2 | sentinel02:26380 | 26380 |
sentinel 3 | sentinel03:26381 | 26381 |
配置conf
master
port 6380
daemonize no
dir /root/redis-6.0.6/data
slave
daemonize no
dir /root/redis-6.0.6/data
#logfile "6380.log"
`slaveof 127.0.0.1 6380`
sentinel
- 配置的参考:解压文件sentinel.conf
- 可以用cat sentinel.conf |grep -v “#” | grep -v “^$” > sentinel-26379.conf 弃掉无用信息
`port 26379`
daemonize no
pidfile "/var/run/redis-sentinel.pid"
#logfile ""
dir "/root/redis-6.0.6/data"
sentinel down-after-milliseconds mymaster 30000
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
`sentinel monitor mymaster 127.0.0.1 6380 2`#最后的2表示2个sentinel判断master down即可客观下线
启动Redis
master
slave
启动sentinel
当sentinel启动完成,会更新conf的内容
port 26379
daemonize no
pidfile "/var/run/redis-sentinel.pid"
logfile ""
dir "/tmp"
sentinel myid 522d51d7ebd6c58a881da63226e8bc1b16f0917e
sentinel deny-scripts-reconfig yes
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
#Generated by CONFIG REWRITE
protected-mode no
user default on nopass ~* +@all
`sentinel known-replica mymaster 127.0.0.1 6380`
`sentinel known-replica mymaster 127.0.0.1 6381`
`sentinel known-sentinel mymaster 127.0.0.1 26380 16218e9672c6712459abb8ca9c5b221d50efc713`
`sentinel known-sentinel mymaster 127.0.0.1 26381 9a9f3d1ee0289eb65bd72f775b13b5ad3fcad7bd`
如红色部分,配置增加了slave和其他sentinel的信息
在sentinel 01上看到的系统info sentinel
master–>down
从sentinel看到:
32108:X 21 Nov 2021 05:52:45.808 # `+sdown` master mymaster 127.0.0.1 6380`#主观下线`
32108:X 21 Nov 2021 05:52:45.893 # +new-epoch 1
32108:X 21 Nov 2021 05:52:45.893 # +vote-for-leader 16218e9672c6712459abb8ca9c5b221d50efc713 1
32108:X 21 Nov 2021 05:52:46.913 # `+odown` master mymaster 127.0.0.1 6380 #quorum 3/2 `#客观下线`
32108:X 21 Nov 2021 05:52:46.913 # Next failover delay: I will not start a failover before Sun Nov 21 05:58:46 2021
32108:X 21 Nov 2021 05:52:47.173 # +config-update-from sentinel 16218e9672c6712459abb8ca9c5b221d50efc713 127.0.0.1 26380 @ mymaster 127.0.0.1 6380
32108:X 21 Nov 2021 05:52:47.173 # `+switch-master mymaster 127.0.0.1 6380 127.0.0.1 6379#6379被选为新的master`
32108:X 21 Nov 2021 05:52:47.173 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6379
32108:X 21 Nov 2021 05:52:47.173 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
32108:X 21 Nov 2021 05:53:17.196 # +sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6379
状态改变完成:
- 新的master-6379
- 6380/6381为slave注册到6379
6380 down–>running
sentinel将6380的状态改变
在6380 cli client上面看到:
#Replication
`role:slave`
master_host:127.0.0.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:20931
slave_priority:100
slave_read_only:1
connected_slaves:0
master_replid:3317b9be56ad63dcff8a416e9c485b477b1bf176
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:20931
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:18767
repl_backlog_histlen:2165
其变成了slave