日志
`
[node5] Downloading image: calico/cni:v3.15.1
[node5] Downloading image: calico/node:v3.15.1
[node5] Downloading image: calico/pod2daemon-flexvol:v3.15.1
INFO[15:19:43 CST] Generating etcd certs
INFO[15:19:45 CST] Synchronizing etcd certs
INFO[15:19:45 CST] Creating etcd service
INFO[15:19:53 CST] Starting etcd cluster
[master 192.168.9.162] MSG:
Configuration file already exists
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
WARN[15:21:31 CST] Task failed …
WARN[15:21:31 CST] error: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c “export ETCDCTL_API=2;export ETCDCTL_CERT_FILE=‘/etc/ssl/etcd/ssl/admin-master.pem’;export ETCDCTL_KEY_FILE=‘/etc/ssl/etcd/ssl/admin-master-key.pem’;export ETCDCTL_CA_FILE=‘/etc/ssl/etcd/ssl/ca.pem’;/usr/local/bin/etcdctl –endpoints=https://192.168.9.162:2379 cluster-health | grep -q ‘cluster is healthy’”
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.9.162:2379: connect: connection refused

error #0: dial tcp 192.168.9.162:2379: connect: connection refused: Process exited with status 1
Error: Failed to start etcd cluster: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c “export ETCDCTL_API=2;export ETCDCTL_CERT_FILE=‘/etc/ssl/etcd/ssl/admin-master.pem’;export ETCDCTL_KEY_FILE=‘/etc/ssl/etcd/ssl/admin-master-key.pem’;export ETCDCTL_CA_FILE=‘/etc/ssl/etcd/ssl/ca.pem’;/usr/local/bin/etcdctl –endpoints=https://192.168.9.162:2379 cluster-health | grep -q ‘cluster is healthy’”
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.9.162:2379: connect: connection refused

error #0: dial tcp 192.168.9.162:2379: connect: connection refused: Process exited with status 1
Usage:
kk create cluster [flags]

Flags:
-f, –filename string Path to a configuration file
-h, –help help for cluster
–skip-pull-images Skip pre pull images
–with-kubernetes string Specify a supported version of kubernetes
–with-kubesphere Deploy a specific version of kubesphere (default v3.0.0)
-y, –yes Skip pre-check of the installation

Global Flags:
–debug Print detailed information (default true)

Failed to start etcd cluster: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c “export ETCDCTL_API=2;export ETCDCTL_CERT_FILE=‘/etc/ssl/etcd/ssl/admin-master.pem’;export ETCDCTL_KEY_FILE=‘/etc/ssl/etcd/ssl/admin-master-key.pem’;export ETCDCTL_CA_FILE=‘/etc/ssl/etcd/ssl/ca.pem’;/usr/local/bin/etcdctl –endpoints=https://192.168.9.162:2379 cluster-health | grep -q ‘cluster is healthy’”
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.9.162:2379: connect: connection refused

error #0: dial tcp 192.168.9.162:2379: connect: connection refused: Process exited with status 1
`

    1 个月 后
    2 年 后

    我也是这个报错,请问怎么解决

    • lxj 回复了此帖
      5 个月 后

      lxj
      etcd服务端口不通,检查etcd的状态和日志,没问题的话检查机器是不是挂了防火墙或者安全组

      • lxj 回复了此帖

        Cauchy

        查系统日志(命令:journalctl -xe |grep etcd)

        状态(命令:systemctl status etcd)还有防火墙(命令:systemctl status firewalld)

        我实在找不到问题了,求老哥帮帮我!如果需要,我能提供远程。

          lxj
          云主机么?是不是挂了安全组

          • lxj 回复了此帖

            Cauchy

            老哥我补充一下,

            我执行 创建集群命令:./kk create cluster -f config-sample.yaml -a kubesphere.tar.gz –with-packages时,

            使用默认生成的etcd.env,etcd起不来,报错:2379端口被占用。报错如下:

            文件:etcd.env 在这个位置

            Cauchy

            可能是把,我也有点搞不清。这个服务器是在一个局域网内,然后,进行端口映射 到我能访问到的网络里,如果像访问端口,得需要找人开端口,不过我已经 把这个2379和2380端口开完了。

              lxj
              这个192.168.17.75 不是机器网卡上的地址吧,eip ?要用机器网卡上的地址哦

              • lxj 回复了此帖

                Cauchy

                对,不是机器网卡的地址。机器网卡地址是10.10.1.183

                但是,我别的节点服务器 访问不到机器网卡地址10.10.1.183。

                Cauchy

                我是不是 要把 config-sample.yaml 中的 master 改成 机器网卡地址呢

                  lxj

                  如果是在一个节点上执行,这两个机器都填网卡 ip,且这两机器要网络互通。
                  如果执行安装的节点ssh不到这两机器的网卡ip,address 可以填 eip 用来ssh,internalAddress 一定要填机器网卡ip

                  • lxj 回复了此帖

                    Cauchy

                    大佬,我把ip改为机器网卡的地址,好像是 etcd检查过了,但是又遇到了新的问题。

                    说我10250被占用了。。。

                      lxj
                      先卸载了 ./kk delete cluster -f xxx.yaml, 然后netstat -nplt检查下有没有这个10250,没有的话重装一下试试看

                      • lxj 回复了此帖

                        Cauchy

                        我卸载后重装了。不报错,端口占用了

                        ①但报错还是与前边 是相似的报错

                        ②貌似是有镜像拉取失败的情况,但是没有说明是哪个镜像拉取失败了

                        ③好像是在 加载node节点的时候,有问题

                        • lxj 回复了此帖

                          lxj 这是加入集群的时候报错了吗

                          Cauchy

                          大佬,我今天咨询了一下同事,这台服务器确实配了安全组。

                          配置了安全组后,使用昨天你说的方法,改为本机ip能解决问题或减少影响吗?
                          不过,昨天改为本机ip后,确实,etcd那部分,是 没报错了

                          Cauchy

                          蓝色框里的地址是 本机网卡地址(10.10.1.183),但是我node节点ping不到本机网卡地址,只能ping到映射网址(192.168.17.75);
                          现在我的hosts配置中, lb.kubesphere.local是10.10.1.183。

                          所以node节点,访问 https://lb.kubesphere.local:6443 时,访问不到 就报超时的错了。

                          图1时hosts配置内容:

                          图2,是报错地方(蓝色框):

                          解决问题思路:

                          ①让 lb.kubesphere.local 改为 192.168.17.75 ,让 node节点能访问到 https://lb.kubesphere.local:6443 ,也许可以解决问题。

                          ②如何让lb.kubesphere.local 改为 192.168.17.75 :

                          在config-sample.yaml中,将 address 改为 192.168.17.75