创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。

操作系统信息
例如:Anolis 7.9

Kubernetes版本信息
kubectl version 命令执行结果贴在下方

Client Version: version.Info{Major:“1”, Minor:“23”, GitVersion:“v1.23.10”, GitCommit:“7e54d50d3012cf3389e43b096ba35300f36e0817”, GitTreeState:“clean”, BuildDate:“2022-08-17T18:32:54Z”, GoVersion:“go1.17.13”, Compiler:“gc”, Platform:“linux/amd64”}

Server Version: version.Info{Major:“1”, Minor:“23”, GitVersion:“v1.23.10”, GitCommit:“7e54d50d3012cf3389e43b096ba35300f36e0817”, GitTreeState:“clean”, BuildDate:“2022-08-17T18:26:59Z”, GoVersion:“go1.17.13”, Compiler:“gc”, Platform:“linux/amd64”}

容器运行时
docker version / crictl version / nerdctl version 结果贴在下方

KubeSphere版本信息

3.3.0 kk安装

问题是什么

原来运行正常,忽然出现无法登录,页面上提示认证错误,

request to http://ks-apiserver/oauth/token failed, reason: connect ECONNREFUSED 10.233.6.101:80

10.233.6.101:80 是ks-apiserver service的ip和端口,发现
ks-apiserver的pod为 CrashLoopBackOff 状态,日志见下图

E0110 09:42:56.155661       1 reflector.go:138] pkg/client/informers/externalversions/factory.go:128: Failed to watch *v2beta1.Config: failed to list *v2beta1.Config: conversion webhook for notification.kubesphere.io/v2beta2, Kind=Config failed: Post "https://notification-manager-webhook.kubesphere-monitoring-system.svc:443/convert?timeout=30s": net/http: TLS handshake timeout
I0110 09:43:13.624513       1 trace.go:201] Trace[1830770737]: "Reflector ListAndWatch" name:pkg/client/informers/externalversions/factory.go:128 (10-Jan-2024 09:43:00.617) (total time: 10007ms):
Trace[1830770737]: [10.007430408s] [10.007430408s] END
E0110 09:43:13.624548       1 reflector.go:138] pkg/client/informers/externalversions/factory.go:128: Failed to watch *v2beta1.Receiver: failed to list *v2beta1.Receiver: conversion webhook for notification.kubesphere.io/v2beta2, Kind=Receiver failed: Post "https://notification-manager-webhook.kubesphere-monitoring-system.svc:443/convert?timeout=30s": net/http: TLS handshake timeout
I0110 09:43:15.311173       1 trace.go:201] Trace[2098285869]: "Reflector ListAndWatch" name:pkg/client/informers/externalversions/factory.go:128 (10-Jan-2024 09:43:00.304) (total time: 10006ms):
Trace[2098285869]: [10.006458857s] [10.006458857s] END
E0110 09:43:15.311283       1 reflector.go:138] pkg/client/informers/externalversions/factory.go:128: Failed to watch *v2beta1.Config: failed to list *v2beta1.Config: conversion webhook for notification.kubesphere.io/v2beta2, Kind=Config failed: Post "https://notification-manager-webhook.kubesphere-monitoring-system.svc:443/convert?timeout=30s": net/http: TLS handshake timeout

根据日志,查看到https://notification-manager-webhook.kubesphere-monitoring-system.svc:443 请求的的
名为notification-manager-webhook的service

找到其对应的pod
查看service的selector label

kubectl describe svc notification-manager-webhook -n kubesphere-monitoring-system

找到pod名称

kubectl get pods --show-labels -n kubesphere-monitoring-system | grep 'control-plane=controller-manager'

查看pod日志

[root@master ~]# kubectl logs  notification-manager-operator-7f7c564948-pfl66 -c notification-manager-operator -n kubesphere-monitoring-system
2024-01-09T07:05:53.420+0800	INFO	controller-runtime.metrics	metrics server is starting to listen	{"addr": "127.0.0.1:8080"}
2024-01-09T07:05:53.421+0800	INFO	controller-runtime.builder	skip registering a mutating webhook, admission.Defaulter interface is not implemented	{"GVK": "notification.kubesphere.io/v2beta2, Kind=Config"}
2024-01-09T07:05:53.421+0800	INFO	controller-runtime.builder	Registering a validating webhook	{"GVK": "notification.kubesphere.io/v2beta2, Kind=Config", "path": "/validate-notification-kubesphere-io-v2beta2-config"}
2024-01-09T07:05:53.421+0800	INFO	controller-runtime.webhook	registering webhook	{"path": "/validate-notification-kubesphere-io-v2beta2-config"}
2024-01-09T07:05:53.421+0800	INFO	controller-runtime.webhook	registering webhook	{"path": "/convert"}
2024-01-09T07:05:53.421+0800	INFO	controller-runtime.builder	conversion webhook enabled	{"object": {"metadata":{"creationTimestamp":null},"spec":{},"status":{}}}
2024-01-09T07:05:53.422+0800	INFO	controller-runtime.builder	skip registering a mutating webhook, admission.Defaulter interface is not implemented	{"GVK": "notification.kubesphere.io/v2beta2, Kind=Receiver"}
2024-01-09T07:05:53.422+0800	INFO	controller-runtime.builder	Registering a validating webhook	{"GVK": "notification.kubesphere.io/v2beta2, Kind=Receiver", "path": "/validate-notification-kubesphere-io-v2beta2-receiver"}
2024-01-09T07:05:53.422+0800	INFO	controller-runtime.webhook	registering webhook	{"path": "/validate-notification-kubesphere-io-v2beta2-receiver"}
2024-01-09T07:05:53.422+0800	INFO	controller-runtime.builder	conversion webhook enabled	{"object": {"metadata":{"creationTimestamp":null},"spec":{},"status":{}}}
2024-01-09T07:05:53.423+0800	INFO	setup	starting manager
I0109 07:05:53.519098       1 leaderelection.go:242] attempting to acquire leader lease  kubesphere-monitoring-system/7b8d27e6.kubesphere.io...
2024-01-09T07:05:53.519+0800	INFO	controller-runtime.manager	starting metrics server	{"path": "/metrics"}
2024-01-09T07:05:54.518+0800	INFO	controller-runtime.webhook.webhooks	starting webhook server
2024-01-09T07:05:54.519+0800	INFO	controller-runtime.certwatcher	Updated current TLS certificate
2024-01-09T07:05:54.520+0800	INFO	controller-runtime.webhook	serving webhook server	{"host": "", "port": 9443}
2024-01-09T07:05:54.520+0800	INFO	controller-runtime.certwatcher	Starting certificate watcher
2024/01/09 07:05:55 http: TLS handshake error from 10.233.70.0:48936: EOF
I0109 07:06:24.922926       1 leaderelection.go:252] successfully acquired lease kubesphere-monitoring-system/7b8d27e6.kubesphere.io
2024-01-09T07:06:24.923+0800	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"ConfigMap","namespace":"kubesphere-monitoring-system","name":"7b8d27e6.kubesphere.io","uid":"32029b0e-b927-454b-8274-a27a9fd3c65c","apiVersion":"v1","resourceVersion":"55096069"}, "reason": "LeaderElection", "message": "notification-manager-operator-7f7c564948-pfl66_9e2c0d51-4260-4e4a-9d6c-0046c09a23b2 became leader"}
2024-01-09T07:06:24.923+0800	INFO	controller-runtime.controller	Starting EventSource	{"controller": "notificationmanager", "source": "kind source: /, Kind="}
2024-01-09T07:06:25.425+0800	INFO	controller-runtime.controller	Starting EventSource	{"controller": "notificationmanager", "source": "kind source: /, Kind="}
2024-01-09T07:06:25.425+0800	INFO	controller-runtime.controller	Starting Controller	{"controller": "notificationmanager"}
2024-01-09T07:06:25.425+0800	INFO	controller-runtime.controller	Starting workers	{"controller": "notificationmanager", "worker count": 1}
2024-01-09T07:06:28.621+0800	DEBUG	controller-runtime.controller	Successfully Reconciled	{"controller": "notificationmanager", "request": "/notification-manager"}
I0109 07:06:45.617001       1 leaderelection.go:288] failed to renew lease kubesphere-monitoring-system/7b8d27e6.kubesphere.io: failed to tryAcquireOrRenew context deadline exceeded
2024-01-09T07:06:45.617+0800	ERROR	setup	problem running manager	{"error": "leader election lost"}
github.com/go-logr/zapr.(*zapLogger).Error
	/go/pkg/mod/github.com/go-logr/zapr@v0.1.0/zapr.go:128
main.main
	/workspace/main.go:101
runtime.main
	/usr/local/go/src/runtime/proc.go:203
2024-01-09T07:06:45.617+0800	DEBUG	controller-runtime.manager.events	Normal	{"object": {"kind":"ConfigMap","namespace":"kubesphere-monitoring-system","name":"7b8d27e6.kubesphere.io","uid":"32029b0e-b927-454b-8274-a27a9fd3c65c","apiVersion":"v1","resourceVersion":"55096083"}, "reason": "LeaderElection", "message": "notification-manager-operator-7f7c564948-pfl66_9e2c0d51-4260-4e4a-9d6c-0046c09a23b2 stopped leading"}

有点怀疑证书过期,可是查看证书时间也是正常的

[root@master ~]# kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0110 10:56:13.766871   28153 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Jul 12, 2024 08:52 UTC   184d            ca                      no
apiserver                  Jul 12, 2024 08:52 UTC   184d            ca                      no
apiserver-kubelet-client   Jul 12, 2024 08:52 UTC   184d            ca                      no
controller-manager.conf    Jul 12, 2024 08:52 UTC   184d            ca                      no
front-proxy-client         Jul 12, 2024 08:52 UTC   184d            front-proxy-ca          no
scheduler.conf             Jul 12, 2024 08:52 UTC   184d            ca                      no

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Jul 10, 2033 08:52 UTC   9y              no
front-proxy-ca          Jul 10, 2033 08:52 UTC   9y              no

启用重启大法,重启有问题pod

kubectl rollout restart deployment notification-manager-operator -n kubesphere-monitoring-system

果然,等全部重启完成后,页面可以正常登录。。。

问题虽然暂时解决,但是原因仍然未知~