创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。
操作系统信息
虚拟机,Centos7.9,8C16G
Kubernetes版本信息
[root@kubesphere1 ~]# kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.15", GitCommit:"b84cb8ab29366daa1bba65bc67f54de2f6c34848", GitTreeState:"clean", BuildDate:"2022-12-08T10:49:13Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.15", GitCommit:"b84cb8ab29366daa1bba65bc67f54de2f6c34848", GitTreeState:"clean", BuildDate:"2022-12-08T10:42:57Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
容器运行时
将 docker version / crictl version / nerdctl version 结果贴在下方
KubeSphere版本信息
KubeSphere 版本 : v3.4.1
离线安装,在linux上使用kk安装。
问题是什么
报错1:failed, reason: getaddrinfo EAI_AGAIN ks-apiserver.kubesphere-system
搜索类似问题,大多数原因没有coredns导致console连接apiserver异常,检查coredns状态均正常。
报错2:console web界面提示”500 Internal Server Error“
请求 URL:
http://10.210.10.231:30880/kapis/resources.kubesphere.io/v1alpha3/namespaces?labelSelector=%21kubesphere.io%2Fdevopsproject&sortBy=createTime&limit=10
ks-apiserver报错:error: dial tcp: lookup redis.kubesphere-system.svc: i/o timeout"
ks-controller-manager报错:LDAP Result Code 200 “Network Error”: dial tcp: lookup openldap.kubesphere-system.svc on 169.254.25.10:53: read
报错2、3为节点kubesphere3上面的pod错误,节点kubesphere1、2上面正常。
下面为部分操作记录:
# 环境信息
[root@kubesphere1 ~]# kubectl get node -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubesphere1 Ready control-plane,master,worker 6d19h v1.23.15 10.210.10.231 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://24.0.6
kubesphere2 Ready control-plane,master,worker 6d19h v1.23.15 10.210.10.232 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://24.0.6
kubesphere3 Ready control-plane,master,worker 6d19h v1.23.15 10.210.10.233 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://24.0.6
[root@kubesphere1 ~]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
kubesphere1 664m 8% 9311Mi 63%
kubesphere2 728m 9% 9423Mi 64%
kubesphere3 680m 8% 6391Mi 43%
# kubesphere3节点的ks-apiserver和ks-controller-manager状态异常,日志显示连接redis和ldap超时。
[root@kubesphere1 ~]# kubectl get pod -n kubesphere-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ks-apiserver-6f65d5c8b-cb8pc 1/1 Running 7 (14h ago) 14h 10.233.124.92 kubesphere2 <none> <none>
ks-apiserver-6f65d5c8b-g2748 1/1 Running 20 (14h ago) 6d18h 10.233.107.141 kubesphere1 <none> <none>
ks-apiserver-6f65d5c8b-gtgtt 0/1 CrashLoopBackOff 7 (2m2s ago) 14m 10.233.76.136 kubesphere3 <none> <none>
ks-console-844747bfd6-6rgbg 1/1 Running 4 (14h ago) 6d18h 10.233.107.140 kubesphere1 <none> <none>
ks-console-844747bfd6-72xns 1/1 Running 4 (14h ago) 6d18h 10.233.76.110 kubesphere3 <none> <none>
ks-console-844747bfd6-xq6nd 1/1 Running 2 (14h ago) 5d19h 10.233.124.58 kubesphere2 <none> <none>
ks-controller-manager-56758c7878-6dn6v 1/1 Running 16 (14h ago) 6d18h 10.233.107.127 kubesphere1 <none> <none>
ks-controller-manager-56758c7878-plt7d 0/1 CrashLoopBackOff 6 (3m22s ago) 14m 10.233.76.135 kubesphere3 <none> <none>
ks-controller-manager-56758c7878-stxcf 1/1 Running 6 (14h ago) 14h 10.233.124.85 kubesphere2 <none> <none>
ks-installer-b7b88cb58-67828 1/1 Running 5 (14h ago) 6d18h 10.233.107.132 kubesphere1 <none> <none>
minio-676f77b998-47jk9 1/1 Running 2 (14h ago) 5d19h 10.233.124.69 kubesphere2 <none> <none>
openldap-0 1/1 Running 5 (14h ago) 6d12h 10.233.76.111 kubesphere3 <none> <none>
openldap-1 1/1 Running 5 (14h ago) 6d12h 10.233.107.145 kubesphere1 <none> <none>
openpitrix-import-job-kkrmk 0/1 Completed 0 14h 10.233.124.95 kubesphere2 <none> <none>
redis-54b56679bd-5l7p8 1/1 Running 2 (14h ago) 5d19h 10.233.124.68 kubesphere2 <none> <none>
[root@kubesphere1 ~]# kubectl logs -fn kubesphere-system ks-apiserver-6f65d5c8b-gtgtt
W0821 04:36:20.187645 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
W0821 04:36:20.189389 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
E0821 04:36:25.198973 1 cache.go:69] failed to create cache, error: dial tcp: lookup redis.kubesphere-system.svc: i/o timeout
E0821 04:36:25.199042 1 run.go:74] "command failed" err="failed to create cache, error: dial tcp: lookup redis.kubesphere-system.svc: i/o timeout"
[root@kubesphere1 ~]# kubectl logs -fn kubesphere-system ks-apiserver ks-controller-manager-56758c7878-plt7d
Error from server (NotFound): pods "ks-apiserver" not found
[root@kubesphere1 ~]# kubectl logs -fn kubesphere-system ks-controller-manager-56758c7878-plt7d
W0821 04:34:28.181726 1 client_config.go:618] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0821 04:34:28.183376 1 server.go:197] setting up manager
I0821 04:34:28.227481 1 listener.go:44] "controller-runtime/metrics: Metrics server is starting to listen" addr=":8080"
F0821 04:35:05.244700 1 server.go:219] unable to register controllers to the manager: failed to connect to ldap service, please check ldap status, er ror: factory is not able to fill the pool: LDAP Result Code 200 "Network Error": dial tcp: lookup openldap.kubesphere-system.svc on 169.254.25.10:53: read udp 10.233.76.135:42640->169.254.25.10:53: i/o timeout
# 测试同在节点kubesphere3的ks-console无法和redis、openldap通信
[root@kubesphere1 ~]# kubectl exec -itn kubesphere-system ks-console-844747bfd6-72xns sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/opt/kubesphere/console $ ping openldap.kubesphere-system.svc
^C
/opt/kubesphere/console $ ping redis.kubesphere-system.svc
^C
/opt/kubesphere/console $ cat /etc/resolv.conf
nameserver 169.254.25.10
search kubesphere-system.svc.cluster.local svc.cluster.local cluster.local test.com
options ndots:5
/opt/kubesphere/console $ ping 169.254.25.10
PING 169.254.25.10 (169.254.25.10): 56 data bytes
64 bytes from 169.254.25.10: seq=0 ttl=42 time=0.091 ms
^C
--- 169.254.25.10 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.091/0.091/0.091 ms
/opt/kubesphere/console $ exit
# redis、openldap、coredns状态正常
[root@kubesphere1 ~]# kubectl get pod -A | grep "redis\|openldap\|coredns"
argocd devops-argocd-redis-99c6d77c5-k4cs5 1/1 Running 2 (15h ago) 6d13h
kube-system coredns-86688d9f48-cxhcl 1/1 Running 0 15h
kube-system coredns-86688d9f48-xb6jf 1/1 Running 0 15h
kubesphere-system openldap-0 1/1 Running 5 (15h ago) 6d13h
kubesphere-system openldap-1 1/1 Running 5 (15h ago) 6d13h
kubesphere-system redis-54b56679bd-5l7p8 1/1 Running 2 (15h ago) 5d20h
# 测试其他节点pod可以和redis通信
[root@kubesphere1 ~]# kubectl exec -itn kubesphere-system ks-console-844747bfd6-6rgbg sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/opt/kubesphere/console $ ping redis.kubesphere-system.svc
PING redis.kubesphere-system.svc (10.233.10.14): 56 data bytes
64 bytes from 10.233.10.14: seq=0 ttl=42 time=0.077 ms
^C
--- redis.kubesphere-system.svc ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 0.077/0.077/0.077 ms
/opt/kubesphere/console $ exit
# 检查calico状态正常
[root@kubesphere1 ~]# kubectl get pod -A | grep calico
kube-system calico-kube-controllers-7f5795c4bc-c8bhp 1/1 Running 4 (14h ago) 6d18h
kube-system calico-node-4p9ls 1/1 Running 0 14h
kube-system calico-node-g7srh 1/1 Running 4 (14h ago) 6d18h
kube-system calico-node-sczkt 1/1 Running 0 14h
[root@kubesphere1 ~]# kubectl get pod -A -o wide| grep calico
kube-system calico-kube-controllers-7f5795c4bc-c8bhp 1/1 Running 4 (14h ago) 6d18h 10.233.1 07.128 kubesphere1 <none> <none>
kube-system calico-node-4p9ls 1/1 Running 0 14h 10.210.1 0.233 kubesphere3 <none> <none>
kube-system calico-node-g7srh 1/1 Running 4 (14h ago) 6d18h 10.210.1 0.231 kubesphere1 <none> <none>
kube-system calico-node-sczkt 1/1 Running 0 14h 10.210.1 0.232 kubesphere2 <none> <none>
[root@kubesphere1 ~]# kubectl get pod -A -o wide| grep calico
kube-system calico-kube-controllers-7f5795c4bc-c8bhp 1/1 Running 4 (14h ago) 6d18h 10.233.107.128 kubesphere1
kube-system calico-node-4p9ls 1/1 Running 0 14h 10.210.10.233 kubesphere3
kube-system calico-node-g7srh 1/1 Running 4 (14h ago) 6d18h 10.210.10.231 kubesphere1
kube-system calico-node-sczkt 1/1 Running 0 14h 10.210.10.232 kubesphere2
# 重建kubespere3节点的calico pod没有恢复
[root@kubesphere1 ~]# kubectl delete pod -n kube-system calico-node-4p9ls
pod "calico-node-4p9ls" deleted
[root@kubesphere1 ~]# kubectl exec -itn kubesphere-system ks-console-844747bfd6-72xns sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/opt/kubesphere/console $ ping redis.kubesphere-system.svc
ping: bad address 'redis.kubesphere-system.svc'