节点信息如下

[centos@k8s-node1 k8s]$ kubectl get nodes
NAME        STATUS   ROLES    AGE     VERSION
k8s-node1   Ready    master   4d10h   v1.17.3
k8s-node2   Ready    <none>   4d9h    v1.17.3
k8s-node3   Ready    <none>   4d9h    v1.17.3

    显示都是正常的呀,哪儿有问题?

    哦,第二个有问题,你按照提示检查一下
    “1. check the storage configuration and storage server”,
    “2. make sure the DNS address in /etc/resolv.conf is available.”,
    “3. execute ‘helm del –purge ks-minio && kubectl delete job -n kubesphere-system ks-minio-make-bucket-job’”,
    “4. Restart the installer pod in kubesphere-system namespace”

      magese
      可以先尝试着看下minio-make-bucket-job-s5n7h那个pod日志,一般是环境dns或者存储问题,也有可能集群节点之间时间不同步。

        Cauchy
        job日志如下

        [centos@k8s-node1 ~]$ kubectl logs minio-make-bucket-job-s5n7h -n kubesphere-system --tail=100
        Connecting to Minio server: http://minio:9000
        mc: <ERROR> Unable to initialize new config from the provided credentials. Get http://minio:9000/probe-bucket-sign-nhxof1bbipkq/?location=: dial tcp: i/o timeout.
        "Failed attempts: 1"
        mc: <ERROR> Unable to initialize new config from the provided credentials. Get http://minio:9000/probe-bucket-sign-mo0x33zvocb6/?location=: dial tcp: i/o timeout.
        "Failed attempts: 2"
        mc: <ERROR> Unable to initialize new config from the provided credentials. Get http://minio:9000/probe-bucket-sign-wr0i4qwpswv5/?location=: dial tcp: i/o timeout.
        "Failed attempts: 3"

        集群节点的时间确认都是一致的。

        如何确认是否为DNS问题呢?

        Cauchy
        /etc/resolv.conf文件配置如下:

        ; generated by /usr/sbin/dhclient-script
        search ap-east-1.compute.internal
        nameserver 172.31.0.2

          rayzhou2017
          1.存储是按照文档安装的openebs,pod都是running,上面有贴。

          [centos@k8s-node1 ~]$ kubectl get sc
          NAME                         PROVISIONER                                                RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
          openebs-device               openebs.io/local                                           Delete          WaitForFirstConsumer   false                  4d19h
          openebs-hostpath (default)   openebs.io/local                                           Delete          WaitForFirstConsumer   false                  4d19h
          openebs-jiva-default         openebs.io/provisioner-iscsi                               Delete          Immediate              false                  4d19h
          openebs-snapshot-promoter    volumesnapshot.external-storage.k8s.io/snapshot-promoter   Delete          Immediate              false                  4d19h

          2.DNS没有进行过修改,楼上贴了配置。

          3.helm版本也换过好几次了,也执行过helm del –purge ks-minio重装很多次了,都不行。

          我修改/etc/resolv.conf的配置为

          nameserver 8.8.8.8

          再次进行最小化安装,ks-apigateway直接起不来了。。。

          [centos@k8s-node1 k8s]$ kubectl get pod -n kubesphere-system
          NAME                                     READY   STATUS             RESTARTS   AGE
          ks-account-596657f8c6-t6lwh              1/1     Running            2          6m7s
          ks-apigateway-78bcdc8ffc-hlrdg           0/1     CrashLoopBackOff   5          6m8s
          ks-apiserver-5b548d7c5c-p7bpv            1/1     Running            0          6m7s
          ks-console-78bcf96dbf-xvrnk              1/1     Running            0          6m3s
          ks-controller-manager-696986f8d9-4qjkx   1/1     Running            0          6m6s
          ks-installer-75b8d89dff-28jz5            1/1     Running            0          7m28s
          openldap-0                               1/1     Running            0          6m28s
          redis-6fd6c6d6f9-vk6k6                   1/1     Running            0          6m37s

          查看ks-apigateway日志如下:

          [centos@k8s-node1 k8s]$ kubectl logs ks-apigateway-78bcdc8ffc-hlrdg -n kubesphere-system
          2020/06/30 08:17:01 [INFO][cache:0xc00078c050] Started certificate maintenance routine
          [DEV NOTICE] Registered directive 'authenticate' before 'jwt'
          [DEV NOTICE] Registered directive 'authentication' before 'jwt'
          [DEV NOTICE] Registered directive 'swagger' before 'jwt'
          Activating privacy features... done.
          E0630 08:17:06.752403       1 redis.go:51] unable to reach redis hostdial tcp: i/o timeout
          2020/06/30 08:17:06 dial tcp: i/o timeout

          头都要炸了,大佬们救救我吧 Forest-L @Cauchy @rayzhou2017

            magese ks-account已经起来了,等等ks-apigateway应该就正常了,不想等的话可以直接把那个pod删掉重新拉起。

            所有节点的dns都需要有效哦

              妈耶,我要哭了。我又把/etc/resolv.conf修改回原来的配置,重启coredns。然鹅ks-account和ks-apigateway无限失败重启。

              kubesphere-system              ks-account-596657f8c6-pklvp                    1/1     Running            4          7m55s
              kubesphere-system              ks-apigateway-78bcdc8ffc-z49d6                 0/1     CrashLoopBackOff   6          7m57s
              kubesphere-system              ks-apiserver-5b548d7c5c-nv2wp                  1/1     Running            0          7m56s
              kubesphere-system              ks-console-78bcf96dbf-l9rz9                    1/1     Running            0          7m52s
              kubesphere-system              ks-controller-manager-696986f8d9-98xp5         1/1     Running            0          7m55s
              kubesphere-system              ks-installer-75b8d89dff-cd4kk                  1/1     Running            0          9m18s
              kubesphere-system              openldap-0                                     1/1     Running            0          8m16s
              kubesphere-system              redis-6fd6c6d6f9-g6q5b                         1/1     Running            0          8m26s

              Cauchy
              现在ks-account和ks-apigateway一直在Error、CrashLoopBackOff、Running之间徘徊。

                hongming
                试了一下kubectl -n kube-system edit configmap coredns 配置中没有proxy和upstream。安装依旧失败…

                # Please edit the object below. Lines beginning with a '#' will be ignored,
                # and an empty file will abort the edit. If an error occurs while saving this file will be
                # reopened with the relevant failures.
                #
                apiVersion: v1
                data:
                  Corefile: |
                    .:53 {
                        errors
                        health {
                           lameduck 5s
                        }
                        ready
                        kubernetes cluster.local in-addr.arpa ip6.arpa {
                           pods insecure
                           fallthrough in-addr.arpa ip6.arpa
                           ttl 30
                        }
                        prometheus :9153
                        forward . /etc/resolv.conf
                        cache 30
                        loop
                        reload
                        loadbalance
                    }
                kind: ConfigMap
                metadata:
                  creationTimestamp: "2020-06-25T06:31:18Z"
                  name: coredns
                  namespace: kube-system
                  resourceVersion: "175"
                  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
                  uid: 8fd545c4-9718-4537-a2fc-ebd6139547a

                  Cauchy
                  我再次发送了邮件,麻烦有空时查收一下,十分感谢!