• KSV
  • ks-apiserver 无法正常启动,导致无法登陆kubesphere

操作系统信息

  • 虚拟机 Oracle Linux Server 8.4,4C/16G
  • 安装命令:./kk create cluster –with-kubernetes v1.21.5 –with-kubesphere v3.2.0

Kubernetes版本信息

# kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5", GitCommit:"aea7bbadd2fc0cd689de94a54e5b7b758869d691", GitTreeState:"clean", BuildDate:"2021-09-15T21:10:45Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.5", GitCommit:"aea7bbadd2fc0cd689de94a54e5b7b758869d691", GitTreeState:"clean", BuildDate:"2021-09-15T21:04:16Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}

容器运行时

# docker version
Client:
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:50:40 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:55:09 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b638
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

KubeSphere版本信息

  • kubernetes v1.21.5
  • kubesphere v3.2.0
  • 使用kk安装,在线安装

安装完毕使用了一段时间,今天不小心在web页面将ks-apiserverreplicas: 1 设置为0,导致界面无法登陆,在登陆虚拟机终端通过命令:kubectl edit deployment ks-apiserver -n kubesphere-systemks-apiserverreplicas: 0从新修改为1,但是现在ks-apiserver 一直在重启,导致kubesphere无法登陆。

一些具体的信息:

# kubectl  get service -n kubesphere-system
NAME                    TYPE           CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
ks-apiserver            NodePort       10.233.2.181   <none>        80:30154/TCP     474d
ks-console              NodePort       10.233.63.17   <none>        80:30880/TCP     474d
ks-controller-manager   ClusterIP      10.233.10.5    <none>        443/TCP          474d
minio                   ClusterIP      10.233.5.70    <none>        9000/TCP         470d
tower                   LoadBalancer   10.233.8.244   <pending>     8080:31970/TCP   474d

# kubectl  get deployments -n kubesphere-system
NAME                    READY   UP-TO-DATE   AVAILABLE   AGE
ks-apiserver            0/1     1            0           474d
ks-console              1/1     1            1           474d
ks-controller-manager   1/1     1            1           474d
ks-installer            0/1     1            0           159m
minio                   1/1     1            1           470d
tower                   1/1     1            1           474d

# kubectl  get pods -n kubesphere-system
NAME                                     READY   STATUS             RESTARTS   AGE
ks-apiserver-64568c4c-twwb6              0/1     CrashLoopBackOff   36         151m
ks-console-565df7f477-4nxw4              1/1     Running            3          474d
ks-controller-manager-7c5d587b7b-2zkt8   1/1     Running            296        470d
ks-installer-6484f6c4cf-7kr85            0/1     CrashLoopBackOff   32         159m
minio-859cb4d777-f5f44                   1/1     Running            9          470d
openpitrix-import-job-lvr9q              0/1     Completed          0          470d
tower-786bb99f5d-2fp97                   1/1     Running            144        474d

在容器重启过程中,界面可以刷出来,登陆界面的报错:

request to http://ks-apiserver/oauth/token failed, reason: connect ECONNREFUSED 10.233.2.181:80

容器的日志报错:

# kubectl logs ks-apiserver-64568c4c-twwb6 -n  kubesphere-system
W0308 17:20:11.978847       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
W0308 17:20:11.980650       1 client_config.go:615] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
E0308 17:20:41.981744       1 metricsserver.go:231] Get "https://10.233.0.1:443/api?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout
W0308 17:20:41.981791       1 options.go:183] ks-apiserver starts without redis provided, it will use in memory cache. This may cause inconsistencies when running ks-apiserver with multiple replicas.
F0308 17:21:11.984241       1 options.go:235] unable to create controller runtime cache: could not create RESTMapper from config

从报错看是连接超时,但是从同一个 namespace的 ks-controller-manager-7c5d587b7b-2zkt8 发现IP和端口是通的

# docker exec -ti beb9270b13c8 /bin/sh
/ # nc -zv 10.233.0.1 443
10.233.0.1 (10.233.0.1:443) open

在虚拟机上执行:

# curl -k https://10.233.0.1:443
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {
    
  },
  "code": 403
}

我现在倾向于是下面报错导致的问题

F0308 17:21:11.984241       1 options.go:235] unable to create controller runtime cache: could not create RESTMapper from config

但是在虚拟机上执行:

# kubectl  get config -n kubesphere-system
No resources found

请帮忙看下。

    xuanyuanaosheng

    Get "https://10.233.0.1:443/api?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout

    应该还是这个原因导致的,我看ks-installer也是0/1 是不是也是连apiserver超时呢?

      Cauchy

      是的。 我删除了容器,发现容器重建后ks-apiserver也无法正常启动,我后来就想的升级下看看能不能解决,就尝试升级kubesphere,发现升级过程估计也得连接ks-apiserver,所以打算通过升级kubesphere的方法来解决问题也失败了。

      所以现在就是想的先恢复 kubesphere,后面在说升级的事情吧。

      这个ks-apiserver日志报错比较少,没发现啥有价值的报错。

      按照您说的还是这个导致的:

      Get "https://10.233.0.1:443/api?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout

      现在容器一直重启,进不去容器里面,没法测试这个是否是通的,但是同一个namespace的ks-controller-manager-7c5d587b7b-2zkt8容器到这个端口是通的

      同时在虚拟机上10.233.0.1:443也没问题的。

        Cauchy 好的,稍等

        测试没啥问题:

        1. 启动一个 alpine 容器:
          docker run -it --rm alpine /bin/sh
          
          # curl -k https://10.233.0.1:443 
          {
            "kind": "Status",
            "apiVersion": "v1",
            "metadata": {
              
            },
            "status": "Failure",
            "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
            "reason": "Forbidden",
            "details": {
              
            },
            "code": 403
          }

        手动启动 镜像kubesphere/ks-apiserver 的容器

        # docker run -ti be3514bdedbc /bin/sh
        / # ping 10.233.0.1
        PING 10.233.0.1 (10.233.0.1): 56 data bytes
        64 bytes from 10.233.0.1: seq=0 ttl=64 time=0.137 ms
        64 bytes from 10.233.0.1: seq=1 ttl=64 time=0.114 ms
        ^C
        --- 10.233.0.1 ping statistics ---
        2 packets transmitted, 2 packets received, 0% packet loss
        round-trip min/avg/max = 0.114/0.125/0.137 ms
        / # nc -zv 10.233.0.1 443
        10.233.0.1 (10.233.0.1:443) open

        但是当kubesphere重启ks-apiserver ,趁着容器处于running状态,进去容器抓数据发现:

        # docker ps | grep ks-apiserver-58cf668bfd-5k2cw
        58e5f37e69d1   kubesphere/ks-apiserver                                             "ks-apiserver --logt…"   31 seconds ago   Up 30 seconds              k8s_ks-apiserver_ks-apiserver-58cf668bfd-5k2cw_kubesphere-system_ebd265a9-0de3-40b2-8a26-43ceee86f049_0
        7c2fb63a016f   registry.cn-beijing.aliyuncs.com/kubesphereio/pause:3.4.1           "/pause"                 34 seconds ago   Up 33 seconds              k8s_POD_ks-apiserver-58cf668bfd-5k2cw_kubesphere-system_ebd265a9-0de3-40b2-8a26-43ceee86f049_0
         docker exec -ti 58e5f37e69d1 /bin/sh
        
        / # ping 10.233.0.1
        PING 10.233.0.1 (10.233.0.1): 56 data bytes
        ^C
        --- 10.233.0.1 ping statistics ---
        3 packets transmitted, 0 packets received, 100% packet loss

        发现网络不通。

        总结一下:宿主机,同一个命名空间下的其他容器,我手动起的容器访问都没问题,就是程序启动的容器无法ping通这个地址请问这种情况下,怎么测试下?


        我进行了继续测试,我删除了 kubesphere-system 命名空间下的 ks-*的容器,其他容器全部重建成功,且运行正常,ks-controller-manager 容器能连接 10.233.0.1的443端口

        13 天 后
        2 个月 后