创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。
操作系统信息
虚拟机,Centos7.9,32C/60G
Kubernetes版本信息
kubectl version
命令执行结果:
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.9", GitCommit:"4fb7ed12476d57b8437ada90b4f93b17ffaeed99", GitTreeState:"clean", BuildDate:"2020-07-15T16:18:16Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.9", GitCommit:"4fb7ed12476d57b8437ada90b4f93b17ffaeed99", GitTreeState:"clean", BuildDate:"2020-07-15T16:10:45Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
容器运行时
docker version
命令执行结果:
Client: Docker Engine - Community
Version: 20.10.12
API version: 1.41
Go version: go1.16.12
Git commit: e91ed57
Built: Mon Dec 13 11:45:41 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.12
API version: 1.41 (minimum version 1.12)
Go version: go1.16.12
Git commit: 459d0df
Built: Mon Dec 13 11:44:05 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.12
GitCommit: 7b11cfaabd73bb80907dd23182b9347b4245eb5d
runc:
Version: 1.0.2
GitCommit: v1.0.2-0-g52b36a2
docker-init:
Version: 0.19.0
GitCommit: de40ad0
crictl version
-bash: crictl: 未找到命令
nerdctl version
-bash: nerdctl: 未找到命令
KubeSphere版本信息
v3.0.0。离线安装。使用kk安装。
问题是什么
我新创建了一个集群,共3个master节点,2个worker节点
- master-1
- master-2
- master-3
- worker-1
- worker-2
master之间通过Haproxy+keepalived做负载均衡。
但是如果master-1宕机,kubesphere的web页面就无法访问,如果是master-2/master-3宕机,页面还是可以正常访问。
一开始安装集群是在master-1上执行的kk命令安装,当master-1宕机后,在其他master上查看etcd的状态systemctl status etcd
10.0.40.197 就是master-1的IP
● etcd.service - etcd docker wrapper
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: active (running) since 四 2023-03-23 10:15:15 CST; 12min ago
Main PID: 10143 (etcd)
Tasks: 10
Memory: 19.6M
CGroup: /system.slice/etcd.service
├─10143 /bin/bash /usr/local/bin/etcd
└─10145 /usr/bin/docker run --restart=on-failure:5 --env-file=/etc/etcd.env --net=host -v /etc/ssl/certs:/etc/ssl/certs:ro -v /etc/ssl/etcd/ssl:/etc/ssl/etcd/ssl:ro -v /var/lib/etcd:/var/lib/etcd:rw --memory=512M --blkio-weight=1000 --name=etcd2 dockerhub.kubekey.loca...
3月 23 10:27:46 ks-node-4 etcd[10143]: 2023-03-23 02:27:46.231768 W | etcdserver: failed to reach the peerURL(https://10.0.40.197:2380) of member aacc57ad2318b2b1 (Get https://10.0.40.197:2380/version: dial tcp 10.0.40.197:2380: connect: no route to host)
3月 23 10:27:46 ks-node-4 etcd[10143]: 2023-03-23 02:27:46.231788 W | etcdserver: cannot get the version of member aacc57ad2318b2b1 (Get https://10.0.40.197:2380/version: dial tcp 10.0.40.197:2380: connect: no route to host)
3月 23 10:27:50 ks-node-4 etcd[10143]: 2023-03-23 02:27:50.640410 W | rafthttp: health check for peer aacc57ad2318b2b1 could not connect: dial tcp 10.0.40.197:2380: connect: no route to host (prober "ROUND_TRIPPER_SNAPSHOT")
3月 23 10:27:50 ks-node-4 etcd[10143]: 2023-03-23 02:27:50.640428 W | rafthttp: health check for peer aacc57ad2318b2b1 could not connect: dial tcp 10.0.40.197:2380: connect: no route to host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
3月 23 10:27:52 ks-node-4 etcd[10143]: 2023-03-23 02:27:52.243686 W | etcdserver: failed to reach the peerURL(https://10.0.40.197:2380) of member aacc57ad2318b2b1 (Get https://10.0.40.197:2380/version: dial tcp 10.0.40.197:2380: connect: no route to host)
3月 23 10:27:52 ks-node-4 etcd[10143]: 2023-03-23 02:27:52.243702 W | etcdserver: cannot get the version of member aacc57ad2318b2b1 (Get https://10.0.40.197:2380/version: dial tcp 10.0.40.197:2380: connect: no route to host)
3月 23 10:27:55 ks-node-4 etcd[10143]: 2023-03-23 02:27:55.640727 W | rafthttp: health check for peer aacc57ad2318b2b1 could not connect: dial tcp 10.0.40.197:2380: connect: no route to host (prober "ROUND_TRIPPER_RAFT_MESSAGE")
3月 23 10:27:55 ks-node-4 etcd[10143]: 2023-03-23 02:27:55.640771 W | rafthttp: health check for peer aacc57ad2318b2b1 could not connect: dial tcp 10.0.40.197:2380: connect: no route to host (prober "ROUND_TRIPPER_SNAPSHOT")
3月 23 10:27:58 ks-node-4 etcd[10143]: 2023-03-23 02:27:58.255969 W | etcdserver: failed to reach the peerURL(https://10.0.40.197:2380) of member aacc57ad2318b2b1 (Get https://10.0.40.197:2380/version: dial tcp 10.0.40.197:2380: connect: no route to host)
3月 23 10:27:58 ks-node-4 etcd[10143]: 2023-03-23 02:27:58.255989 W | etcdserver: cannot get the version of member aacc57ad2318b2b1 (Get https://10.0.40.197:2380/version: dial tcp 10.0.40.197:2380: connect: no route to host)
在其他节点上执行kubectl get nodes
Unable to connect to the server: EOF
此时,已经在跑的服务还可以正常运行,但是如果服务所在的服务器也宕机(已经等待了10分钟,服务没转移),服务就再也无法访问,我猜测就是只有master-1在做调度东西,master-2/master-3这时候是无用的。
但是如果是master-2/master-3宕机,整个集群都很正常。