kubesphere 卡在 etcd

liliang8858 · 2021年8月3日

日志
`
[node5] Downloading image: calico/cni:v3.15.1
[node5] Downloading image: calico/node:v3.15.1
[node5] Downloading image: calico/pod2daemon-flexvol:v3.15.1
INFO[15:19:43 CST] Generating etcd certs
INFO[15:19:45 CST] Synchronizing etcd certs
INFO[15:19:45 CST] Creating etcd service
INFO[15:19:53 CST] Starting etcd cluster
[master 192.168.9.162] MSG:
Configuration file already exists
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
Waiting for etcd to start
WARN[15:21:31 CST] Task failed …
WARN[15:21:31 CST] error: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c “export ETCDCTL_API=2;export ETCDCTL_CERT_FILE=‘/etc/ssl/etcd/ssl/admin-master.pem’;export ETCDCTL_KEY_FILE=‘/etc/ssl/etcd/ssl/admin-master-key.pem’;export ETCDCTL_CA_FILE=‘/etc/ssl/etcd/ssl/ca.pem’;/usr/local/bin/etcdctl –endpoints=https://192.168.9.162:2379 cluster-health | grep -q ‘cluster is healthy’”
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.9.162:2379: connect: connection refused

error #0: dial tcp 192.168.9.162:2379: connect: connection refused: Process exited with status 1
Error: Failed to start etcd cluster: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c “export ETCDCTL_API=2;export ETCDCTL_CERT_FILE=‘/etc/ssl/etcd/ssl/admin-master.pem’;export ETCDCTL_KEY_FILE=‘/etc/ssl/etcd/ssl/admin-master-key.pem’;export ETCDCTL_CA_FILE=‘/etc/ssl/etcd/ssl/ca.pem’;/usr/local/bin/etcdctl –endpoints=https://192.168.9.162:2379 cluster-health | grep -q ‘cluster is healthy’”
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.9.162:2379: connect: connection refused

error #0: dial tcp 192.168.9.162:2379: connect: connection refused: Process exited with status 1
Usage:
kk create cluster [flags]

Flags:
-f, –filename string Path to a configuration file
-h, –help help for cluster
–skip-pull-images Skip pre pull images
–with-kubernetes string Specify a supported version of kubernetes
–with-kubesphere Deploy a specific version of kubesphere (default v3.0.0)
-y, –yes Skip pre-check of the installation

Global Flags:
–debug Print detailed information (default true)

Failed to start etcd cluster: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c “export ETCDCTL_API=2;export ETCDCTL_CERT_FILE=‘/etc/ssl/etcd/ssl/admin-master.pem’;export ETCDCTL_KEY_FILE=‘/etc/ssl/etcd/ssl/admin-master-key.pem’;export ETCDCTL_CA_FILE=‘/etc/ssl/etcd/ssl/ca.pem’;/usr/local/bin/etcdctl –endpoints=https://192.168.9.162:2379 cluster-health | grep -q ‘cluster is healthy’”
Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.9.162:2379: connect: connection refused

error #0: dial tcp 192.168.9.162:2379: connect: connection refused: Process exited with status 1
`

mshuangjing · 2021年9月13日

liliang8858 你好，你这边怎么解决的这个问题？我这边卸载了好多次都是不行

HhighYi · 2023年10月31日

我也是这个报错，请问怎么解决

Llxj · 2024年3月27日

mshuangjing 解决了吗老哥，我也是这个问题

Llxj · 2024年3月27日

highYi 解决了吗老哥，我也是这个问题

Cauchy · 2024年3月27日

lxj
etcd服务端口不通，检查etcd的状态和日志，没问题的话检查机器是不是挂了防火墙或者安全组

Llxj · 2024年3月27日

Cauchy

查系统日志（命令：journalctl -xe |grep etcd）

状态（命令：systemctl status etcd）还有防火墙(命令：systemctl status firewalld)

我实在找不到问题了，求老哥帮帮我！如果需要，我能提供远程。

Cauchy · 2024年3月27日

lxj
云主机么？是不是挂了安全组

Llxj · 2024年3月27日

Cauchy

老哥我补充一下，

我执行创建集群命令：./kk create cluster -f config-sample.yaml -a kubesphere.tar.gz –with-packages时，

使用默认生成的etcd.env，etcd起不来，报错：2379端口被占用。报错如下：

文件：etcd.env 在这个位置

Llxj · 2024年3月27日

Cauchy

可能是把，我也有点搞不清。这个服务器是在一个局域网内，然后，进行端口映射到我能访问到的网络里，如果像访问端口，得需要找人开端口，不过我已经把这个2379和2380端口开完了。

Cauchy · 2024年3月27日

lxj
这个192.168.17.75 不是机器网卡上的地址吧，eip ？要用机器网卡上的地址哦

Llxj · 2024年3月27日

Cauchy

对，不是机器网卡的地址。机器网卡地址是10.10.1.183

但是，我别的节点服务器访问不到机器网卡地址10.10.1.183。

Llxj · 2024年3月27日

Cauchy

我是不是要把 config-sample.yaml 中的 master 改成机器网卡地址呢

Cauchy · 2024年3月27日

lxj

如果是在一个节点上执行，这两个机器都填网卡 ip，且这两机器要网络互通。
如果执行安装的节点ssh不到这两机器的网卡ip，address 可以填 eip 用来ssh，internalAddress 一定要填机器网卡ip

Llxj · 2024年3月27日

Cauchy

大佬，我把ip改为机器网卡的地址，好像是 etcd检查过了，但是又遇到了新的问题。

说我10250被占用了。。。

Cauchy · 2024年3月27日

lxj
先卸载了 ./kk delete cluster -f xxx.yaml, 然后netstat -nplt检查下有没有这个10250，没有的话重装一下试试看

Llxj · 2024年3月27日

Cauchy

我卸载后重装了。不报错，端口占用了

①但报错还是与前边是相似的报错

②貌似是有镜像拉取失败的情况，但是没有说明是哪个镜像拉取失败了

③好像是在加载node节点的时候，有问题

Llxj · 2024年3月27日

lxj 这是加入集群的时候报错了吗

Llxj · 2024年3月28日

Cauchy

大佬，我今天咨询了一下同事，这台服务器确实配了安全组。

配置了安全组后，使用昨天你说的方法，改为本机ip能解决问题或减少影响吗？
不过，昨天改为本机ip后，确实，etcd那部分，是没报错了

Llxj · 2024年3月28日

Cauchy

蓝色框里的地址是本机网卡地址（10.10.1.183），但是我node节点ping不到本机网卡地址，只能ping到映射网址（192.168.17.75）；
现在我的hosts配置中， lb.kubesphere.local是10.10.1.183。

所以node节点，访问 https://lb.kubesphere.local:6443 时，访问不到就报超时的错了。

图1时hosts配置内容：

图2，是报错地方（蓝色框）：

解决问题思路：

①让 lb.kubesphere.local 改为 192.168.17.75 ，让 node节点能访问到 https://lb.kubesphere.local:6443 ，也许可以解决问题。

②如何让lb.kubesphere.local 改为 192.168.17.75 ：

在config-sample.yaml中，将 address 改为 192.168.17.75

kubesphere 卡在 etcd

liliang8858

mshuangjing

HhighYi

Llxj

Llxj

CauchyK零SK壹S

Llxj

CauchyK零SK壹S

Llxj

Llxj

CauchyK零SK壹S

Llxj

Llxj

CauchyK零SK壹S

Llxj

CauchyK零SK壹S

Llxj

Llxj

Llxj

Llxj