操作系统信息
青云虚拟机,Centos7.8,
3个Master 8核/32G
4个worker: 4核16G,其中一个8核32G
Kubernetes版本信息
Kubernetes 1.22.1, Kubesphere 3.2.1
容器运行时
docker version: 20.10.12
KubeSphere版本信息
在线all in one kk安装3.2.1
问题是什么
集群日常用着本来没什么问题,突然看pod无法调度,无论是流水线还是helm创建,都是pending状态
我检查了节点,CPU,内存,POD数量都没有超标。并且我新加了一个8核32G的工作节点,还是无法调度
查看那些pod,没有事件信息
于是我检查了`kube-system`下的
pod/kube-scheduler-xxx(我的master主机名)
发现了如下错误:
E1230 20:46:59.185386 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1beta1.CSIStorageCapacity: failed to list *v1beta1.CSIStorageCapacity: Get "https://172.16.0.2:6443/apis/storage.k8s.io/v1beta1/csistoragecapacities?limit=500&resourceVersion=0": dial tcp 172.16.0.2:6443: connect: connection refused
E1230 20:47:04.597878 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://172.16.0.2:6443/api/v1/nodes?limit=500&resourceVersion=0": dial tcp 172.16.0.2:6443: connect: connection refused
E1230 20:47:04.758842 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.ReplicationController: failed to list *v1.ReplicationController: Get "https://172.16.0.2:6443/api/v1/replicationcontrollers?limit=500&resourceVersion=0": dial tcp 172.16.0.2:6443: connect: connection refused
E1230 20:47:05.012290 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.StatefulSet: failed to list *v1.StatefulSet: Get "https://172.16.0.2:6443/apis/apps/v1/statefulsets?limit=500&resourceVersion=0": dial tcp 172.16.0.2:6443: connect: connection refused
E1230 20:47:06.259581 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.PersistentVolume: failed to list *v1.PersistentVolume: Get "https://172.16.0.2:6443/api/v1/persistentvolumes?limit=500&resourceVersion=0": dial tcp 172.16.0.2:6443: connect: connection refused
E1230 20:47:06.426982 1 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.PersistentVolumeClaim: failed to list *v1.PersistentVolumeClaim: Get "https://172.16.0.2:6443/api/v1/persistentvolumeclaims?limit=500&resourceVersion=0": dial tcp 172.16.0.2:6443: connect: connection refused
请问怎么解决比较好?
更新:我检查了`kubectl get componentstatus`
scheduler Unhealthy Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251: connect: connection refused
controller-manager Healthy ok
etcd-2 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}