创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。
操作系统信息
物理机,Ubuntu22.04,80C/250G
Kubernetes版本信息
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.15", GitCommit:"b84cb8ab29366daa1bba65bc67f54de2f6c34848", GitTreeState:"clean", BuildDate:"2022-12-08T10:49:13Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.15", GitCommit:"b84cb8ab29366daa1bba65bc67f54de2f6c34848", GitTreeState:"clean", BuildDate:"2022-12-08T10:42:57Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
容器运行时
24.0.6
KubeSphere版本信息
例如:v3.0.13/v3.3.0。离线安装。使用kk安装。
问题是什么
报错日志是什么,最好有截图。
1。出现报错,不知道是什么原因,但使用暂时没问题
W1113 19:28:21.489564 1 clusterroles.go:117] invalid aggregation role found: cluster-admin, role-template-view-configmaps
W1113 19:28:21.489716 1 clusterroles.go:117] invalid aggregation role found: cluster-admin, role-template-manage-configmaps
W1113 19:28:21.489758 1 clusterroles.go:117] invalid aggregation role found: cluster-admin, role-template-view-secrets
W1113 19:28:21.489814 1 clusterroles.go:117] invalid aggregation role found: cluster-admin, role-template-manage-secrets
W1113 19:28:21.489855 1 clusterroles.go:117] invalid aggregation role found: cluster-admin, role-template-view-service-accounts
W1113 19:28:21.489908 1 clusterroles.go:117] invalid aggregation role found: cluster-admin, role-template-manage-service-accounts
也有出现下面的错误,其中10.86.229.171是现在集群的node,10.86.13.126是host集群的node
E1113 12:30:39.390647 1 upgradeaware.go:436] Error proxying data from client to backend: readfrom tcp 10.86.229.171:42752->10.234.97.6:9090: read tcp 10.86.229.171:6443->10.86.13.126:20529: read: connection reset by peer
E1113 12:30:39.390725 1 upgradeaware.go:436] Error proxying data from client to backend: readfrom tcp 10.86.229.171:58372->10.234.97.6:9090: read tcp 10.86.229.171:6443->10.86.13.126:60550: read: connection reset by peer
E1113 12:30:39.390727 1 upgradeaware.go:450] Error proxying data from backend to client: write tcp 10.86.229.171:6443->10.86.13.126:64595: write: connection reset by peer
E1113 12:32:18.452018 1 upgradeaware.go:436] Error proxying data from client to backend: readfrom tcp 10.234.97.0:37074->10.234.101.7:9090: read tcp 10.86.229.171:6443->10.86.130.12:6667: read: connection reset by peer
- 集群监控中显示 api-server 的POST,GET延迟有出现有30000 ms,PUT只有10几毫秒,但直接在node节点ping master 延迟很小 。社区里也有一个相似的帖子


- 有一个节点的pod会隔几分钟会监控检查失败 ,重启后正常.
Readiness probe failed: Get "http://10.234.69.8:9200/_cluster/health?local=true": dial tcp 10.234.69.8:9200: connect: invalid argument
pod内部也没有错误。此时在服务器上ping master 是正常的。我怀疑是问题2的高延迟导致的检查失败
- kube-scheduler 查看调度有50%是失败的
大神们有哪些排查思路呢,不知如何检查api-server的高延迟问题和失败的问题