- 已编辑
创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。
操作系统信息
例如:Anolis 7.9
Kubernetes版本信息
将 kubectl version
命令执行结果贴在下方
Client Version: version.Info{Major:“1”, Minor:“23”, GitVersion:“v1.23.10”, GitCommit:“7e54d50d3012cf3389e43b096ba35300f36e0817”, GitTreeState:“clean”, BuildDate:“2022-08-17T18:32:54Z”, GoVersion:“go1.17.13”, Compiler:“gc”, Platform:“linux/amd64”}
Server Version: version.Info{Major:“1”, Minor:“23”, GitVersion:“v1.23.10”, GitCommit:“7e54d50d3012cf3389e43b096ba35300f36e0817”, GitTreeState:“clean”, BuildDate:“2022-08-17T18:26:59Z”, GoVersion:“go1.17.13”, Compiler:“gc”, Platform:“linux/amd64”}
容器运行时
将 docker version
/ crictl version
/ nerdctl version
结果贴在下方
KubeSphere版本信息
3.3.0 kk安装
问题是什么
原来运行正常,忽然出现无法登录,页面上提示认证错误,
request to http://ks-apiserver/oauth/token failed, reason: connect ECONNREFUSED 10.233.6.101:80
10.233.6.101:80 是ks-apiserver service的ip和端口,发现
ks-apiserver的pod为 CrashLoopBackOff 状态,日志见下图
E0110 09:42:56.155661 1 reflector.go:138] pkg/client/informers/externalversions/factory.go:128: Failed to watch *v2beta1.Config: failed to list *v2beta1.Config: conversion webhook for notification.kubesphere.io/v2beta2, Kind=Config failed: Post "https://notification-manager-webhook.kubesphere-monitoring-system.svc:443/convert?timeout=30s": net/http: TLS handshake timeout
I0110 09:43:13.624513 1 trace.go:201] Trace[1830770737]: "Reflector ListAndWatch" name:pkg/client/informers/externalversions/factory.go:128 (10-Jan-2024 09:43:00.617) (total time: 10007ms):
Trace[1830770737]: [10.007430408s] [10.007430408s] END
E0110 09:43:13.624548 1 reflector.go:138] pkg/client/informers/externalversions/factory.go:128: Failed to watch *v2beta1.Receiver: failed to list *v2beta1.Receiver: conversion webhook for notification.kubesphere.io/v2beta2, Kind=Receiver failed: Post "https://notification-manager-webhook.kubesphere-monitoring-system.svc:443/convert?timeout=30s": net/http: TLS handshake timeout
I0110 09:43:15.311173 1 trace.go:201] Trace[2098285869]: "Reflector ListAndWatch" name:pkg/client/informers/externalversions/factory.go:128 (10-Jan-2024 09:43:00.304) (total time: 10006ms):
Trace[2098285869]: [10.006458857s] [10.006458857s] END
E0110 09:43:15.311283 1 reflector.go:138] pkg/client/informers/externalversions/factory.go:128: Failed to watch *v2beta1.Config: failed to list *v2beta1.Config: conversion webhook for notification.kubesphere.io/v2beta2, Kind=Config failed: Post "https://notification-manager-webhook.kubesphere-monitoring-system.svc:443/convert?timeout=30s": net/http: TLS handshake timeout
根据日志,查看到https://notification-manager-webhook.kubesphere-monitoring-system.svc:443 请求的的
名为notification-manager-webhook的service
找到其对应的pod
查看service的selector label
kubectl describe svc notification-manager-webhook -n kubesphere-monitoring-system
找到pod名称
kubectl get pods --show-labels -n kubesphere-monitoring-system | grep 'control-plane=controller-manager'
查看pod日志
[root@master ~]# kubectl logs notification-manager-operator-7f7c564948-pfl66 -c notification-manager-operator -n kubesphere-monitoring-system
2024-01-09T07:05:53.420+0800 INFO controller-runtime.metrics metrics server is starting to listen {"addr": "127.0.0.1:8080"}
2024-01-09T07:05:53.421+0800 INFO controller-runtime.builder skip registering a mutating webhook, admission.Defaulter interface is not implemented {"GVK": "notification.kubesphere.io/v2beta2, Kind=Config"}
2024-01-09T07:05:53.421+0800 INFO controller-runtime.builder Registering a validating webhook {"GVK": "notification.kubesphere.io/v2beta2, Kind=Config", "path": "/validate-notification-kubesphere-io-v2beta2-config"}
2024-01-09T07:05:53.421+0800 INFO controller-runtime.webhook registering webhook {"path": "/validate-notification-kubesphere-io-v2beta2-config"}
2024-01-09T07:05:53.421+0800 INFO controller-runtime.webhook registering webhook {"path": "/convert"}
2024-01-09T07:05:53.421+0800 INFO controller-runtime.builder conversion webhook enabled {"object": {"metadata":{"creationTimestamp":null},"spec":{},"status":{}}}
2024-01-09T07:05:53.422+0800 INFO controller-runtime.builder skip registering a mutating webhook, admission.Defaulter interface is not implemented {"GVK": "notification.kubesphere.io/v2beta2, Kind=Receiver"}
2024-01-09T07:05:53.422+0800 INFO controller-runtime.builder Registering a validating webhook {"GVK": "notification.kubesphere.io/v2beta2, Kind=Receiver", "path": "/validate-notification-kubesphere-io-v2beta2-receiver"}
2024-01-09T07:05:53.422+0800 INFO controller-runtime.webhook registering webhook {"path": "/validate-notification-kubesphere-io-v2beta2-receiver"}
2024-01-09T07:05:53.422+0800 INFO controller-runtime.builder conversion webhook enabled {"object": {"metadata":{"creationTimestamp":null},"spec":{},"status":{}}}
2024-01-09T07:05:53.423+0800 INFO setup starting manager
I0109 07:05:53.519098 1 leaderelection.go:242] attempting to acquire leader lease kubesphere-monitoring-system/7b8d27e6.kubesphere.io...
2024-01-09T07:05:53.519+0800 INFO controller-runtime.manager starting metrics server {"path": "/metrics"}
2024-01-09T07:05:54.518+0800 INFO controller-runtime.webhook.webhooks starting webhook server
2024-01-09T07:05:54.519+0800 INFO controller-runtime.certwatcher Updated current TLS certificate
2024-01-09T07:05:54.520+0800 INFO controller-runtime.webhook serving webhook server {"host": "", "port": 9443}
2024-01-09T07:05:54.520+0800 INFO controller-runtime.certwatcher Starting certificate watcher
2024/01/09 07:05:55 http: TLS handshake error from 10.233.70.0:48936: EOF
I0109 07:06:24.922926 1 leaderelection.go:252] successfully acquired lease kubesphere-monitoring-system/7b8d27e6.kubesphere.io
2024-01-09T07:06:24.923+0800 DEBUG controller-runtime.manager.events Normal {"object": {"kind":"ConfigMap","namespace":"kubesphere-monitoring-system","name":"7b8d27e6.kubesphere.io","uid":"32029b0e-b927-454b-8274-a27a9fd3c65c","apiVersion":"v1","resourceVersion":"55096069"}, "reason": "LeaderElection", "message": "notification-manager-operator-7f7c564948-pfl66_9e2c0d51-4260-4e4a-9d6c-0046c09a23b2 became leader"}
2024-01-09T07:06:24.923+0800 INFO controller-runtime.controller Starting EventSource {"controller": "notificationmanager", "source": "kind source: /, Kind="}
2024-01-09T07:06:25.425+0800 INFO controller-runtime.controller Starting EventSource {"controller": "notificationmanager", "source": "kind source: /, Kind="}
2024-01-09T07:06:25.425+0800 INFO controller-runtime.controller Starting Controller {"controller": "notificationmanager"}
2024-01-09T07:06:25.425+0800 INFO controller-runtime.controller Starting workers {"controller": "notificationmanager", "worker count": 1}
2024-01-09T07:06:28.621+0800 DEBUG controller-runtime.controller Successfully Reconciled {"controller": "notificationmanager", "request": "/notification-manager"}
I0109 07:06:45.617001 1 leaderelection.go:288] failed to renew lease kubesphere-monitoring-system/7b8d27e6.kubesphere.io: failed to tryAcquireOrRenew context deadline exceeded
2024-01-09T07:06:45.617+0800 ERROR setup problem running manager {"error": "leader election lost"}
github.com/go-logr/zapr.(*zapLogger).Error
/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
main.main
/workspace/main.go:101
runtime.main
/usr/local/go/src/runtime/proc.go:203
2024-01-09T07:06:45.617+0800 DEBUG controller-runtime.manager.events Normal {"object": {"kind":"ConfigMap","namespace":"kubesphere-monitoring-system","name":"7b8d27e6.kubesphere.io","uid":"32029b0e-b927-454b-8274-a27a9fd3c65c","apiVersion":"v1","resourceVersion":"55096083"}, "reason": "LeaderElection", "message": "notification-manager-operator-7f7c564948-pfl66_9e2c0d51-4260-4e4a-9d6c-0046c09a23b2 stopped leading"}
有点怀疑证书过期,可是查看证书时间也是正常的
[root@master ~]# kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0110 10:56:13.766871 28153 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Jul 12, 2024 08:52 UTC 184d ca no
apiserver Jul 12, 2024 08:52 UTC 184d ca no
apiserver-kubelet-client Jul 12, 2024 08:52 UTC 184d ca no
controller-manager.conf Jul 12, 2024 08:52 UTC 184d ca no
front-proxy-client Jul 12, 2024 08:52 UTC 184d front-proxy-ca no
scheduler.conf Jul 12, 2024 08:52 UTC 184d ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Jul 10, 2033 08:52 UTC 9y no
front-proxy-ca Jul 10, 2033 08:52 UTC 9y no
启用重启大法,重启有问题pod
kubectl rollout restart deployment notification-manager-operator -n kubesphere-monitoring-system
果然,等全部重启完成后,页面可以正常登录。。。
问题虽然暂时解决,但是原因仍然未知~