环境:ks2.1.1
部署方式:3master 多worker
现象:今天由于其中一台master故障导致虚拟机重启,重启后,ks平台登录返回Internal Server Error
#查询异常pod,目前看起来只有ks-apigateway无法启动,如下:
root@smyk8s-master-01 admin]# kubectl get po -n kubesphere-system
NAME READY STATUS RESTARTS AGE
etcd-f988bdb6f-sg695 1/1 Running 1 134d
ks-account-7478f89875-8wpnd 1/1 Running 0 135m
ks-account-7478f89875-9l2nd 1/1 Running 0 135m
ks-account-7478f89875-wcflg 1/1 Running 0 134m
ks-apigateway-5664c4b76f-4rqd2 1/1 Running 0 134d
ks-apigateway-5664c4b76f-lxkkc 1/1 Running 0 134d
ks-apigateway-77fbd6ff4d-66wtb 0/1 CrashLoopBackOff 7 11m
ks-apigateway-77fbd6ff4d-wm5nc 0/1 CrashLoopBackOff 10 26m
ks-apiserver-659746cf7d-8x4gr 1/1 Running 0 15d
ks-apiserver-659746cf7d-c9bcm 1/1 Running 0 178m
ks-apiserver-659746cf7d-fwnfp 1/1 Running 0 15d
ks-console-bf7975d87-5lpx5 1/1 Running 0 134m
ks-console-bf7975d87-8zs77 1/1 Running 0 135m
ks-console-bf7975d87-ps7tl 1/1 Running 0 135m
ks-controller-manager-d4788677-8drsm 1/1 Running 0 134d
ks-controller-manager-d4788677-9s7dt 1/1 Running 0 178m
ks-controller-manager-d4788677-nhgxt 1/1 Running 0 134d
ks-installer-669b4ff46-rj2g2 1/1 Running 0 134d
minio-8cd46c8d9-7rfx7 1/1 Running 0 168m
mysql-b5597d996-ndmbl 1/1 Running 1 134d
openldap-0 1/1 Running 0 166m
openldap-1 1/1 Running 3 262d
redis-ha-haproxy-f5747f989-gns98 1/1 Running 0 148m
redis-ha-haproxy-f5747f989-t2d6s 1/1 Running 0 147m
redis-ha-haproxy-f5747f989-xs6qj 1/1 Running 0 148m
redis-ha-server-0 2/2 Running 0 140m
redis-ha-server-1 2/2 Running 0 140m
redis-ha-server-2 2/2 Running 0 140m
#进一步查看gateway日志,初步看是redis无法连接,但是redis的pod运行中,尝试重启redis deploy后仍无法连接。
[root@smyk8s-master-01 admin]# kubectl -n kubesphere-system logs ks-apigateway-77fbd6ff4d-66wtb
[DEV NOTICE] Registered directive 'authenticate' before 'jwt'
[DEV NOTICE] Registered directive 'authentication' before 'jwt'
[DEV NOTICE] Registered directive 'swagger' before 'jwt'
Activating privacy features... done.
2020/09/09 07:05:22 [INFO][cache:0xc0000c76d0] Started certificate maintenance routine
E0909 07:05:22.770468 1 redis.go:51] unable to reach redis hostEOF
2020/09/09 07:05:22 EOF
#进一步查看redis日志报错(哨兵选举失败?):
#kubectl -n kubesphere-system logs redis-ha-server-0 redis
1:S 09 Sep 2020 07:09:18.517 * Master replied to PING, replication can continue...
1:S 09 Sep 2020 07:09:18.518 * Trying a partial resynchronization (request 2fcd515ac3f837eaffa39a657bca481bb454a5ca:4576914547).
1:S 09 Sep 2020 07:09:18.519 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 09 Sep 2020 07:09:19.519 * Connecting to MASTER 10.223.43.48:6379
1:S 09 Sep 2020 07:09:19.519 * MASTER <-> REPLICA sync started
1:S 09 Sep 2020 07:09:19.520 * Non blocking connect for SYNC fired the event.
1:S 09 Sep 2020 07:09:19.521 * Master replied to PING, replication can continue...
1:S 09 Sep 2020 07:09:19.522 * Trying a partial resynchronization (request 2fcd515ac3f837eaffa39a657bca481bb454a5ca:4576914547).
1:S 09 Sep 2020 07:09:19.522 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 09 Sep 2020 07:09:20.521 * Connecting to MASTER 10.223.43.48:6379
1:S 09 Sep 2020 07:09:20.521 * MASTER <-> REPLICA sync started
1:S 09 Sep 2020 07:09:20.522 * Non blocking connect for SYNC fired the event.
1:S 09 Sep 2020 07:09:20.523 * Master replied to PING, replication can continue...
1:S 09 Sep 2020 07:09:20.523 * Trying a partial resynchronization (request 2fcd515ac3f837eaffa39a657bca481bb454a5ca:4576914547).
1:S 09 Sep 2020 07:09:20.524 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 09 Sep 2020 07:09:21.528 * Connecting to MASTER 10.223.43.48:6379
1:S 09 Sep 2020 07:09:21.528 * MASTER <-> REPLICA sync started
1:S 09 Sep 2020 07:09:21.529 * Non blocking connect for SYNC fired the event.
1:S 09 Sep 2020 07:09:21.529 * Master replied to PING, replication can continue...
1:S 09 Sep 2020 07:09:21.531 * Trying a partial resynchronization (request 2fcd515ac3f837eaffa39a657bca481bb454a5ca:4576914547).
1:S 09 Sep 2020 07:09:21.531 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
#kubectl -n kubesphere-system logs -f redis-ha-haproxy-5d9d9c9d55-x5r8z
[NOTICE] 252/094142 (1) : New worker #1 (6) forked
[WARNING] 252/094143 (6) : Server check_if_redis_is_master_0/R0 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '10.223.2.232')", check duration: 1000ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 252/094143 (6) : Server check_if_redis_is_master_0/R1 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '10.223.2.232')", check duration: 1000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 252/094143 (6) : Server check_if_redis_is_master_0/R2 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '10.223.2.232')", check duration: 1000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 252/094143 (6) : backend 'check_if_redis_is_master_0' has no server available!
[WARNING] 252/094143 (6) : Server check_if_redis_is_master_1/R0 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '10.223.31.54')", check duration: 1000ms. 2 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 252/094143 (6) : Server check_if_redis_is_master_1/R1 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '10.223.31.54')", check duration: 1000ms. 1 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 252/094143 (6) : Server check_if_redis_is_master_1/R2 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string '10.223.31.54')", check duration: 1000ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 252/094143 (6) : backend 'check_if_redis_is_master_1' has no server available!
[WARNING] 252/094143 (6) : Server bk_redis_master/R0 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'role:master')", check duration: 1001ms. 2 active and 0 backup servers left.0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 252/094144 (6) : Server bk_redis_master/R1 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'role:master')", check duration: 1000ms. 1 active and 0 backup servers left.0 sessions active, 0 requeued, 0 remaining in queue.
[WARNING] 252/094144 (6) : Server bk_redis_master/R2 is DOWN, reason: Layer7 timeout, info: " at step 5 of tcp-check (expect string 'role:master')", check duration: 1001ms. 0 active and 0 backup servers left.0 sessions active, 0 requeued, 0 remaining in queue.
[ALERT] 252/094144 (6) : backend 'bk_redis_master' has no server available!
请教下大佬们redis这个如何解决呢