创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。

操作系统信息
例如:虚拟机/物理机,Centos7.5/Ubuntu18.04,4C/8G

Kubernetes版本信息
kubectl version 命令执行结果贴在下方

容器运行时
docker version / crictl version / nerdctl version 结果贴在下方

KubeSphere版本信息
v1.28.15/v4.1.2。在线安装, 使用kk安装。

config.yaml增加了新节点,通过 kk add nodes -f config.yaml 直到ETCD检测就失败,重复几次依旧。

error: Pipeline[AddNodesPipeline] execute failed: Module[ETCDConfigureModule] exec failed:

failed: [node2] [ExistETCDHealthCheck] exec failed after 20 retries: etcd health check failed: Failed to exec command: sudo -E /bin/bash -c “export ETCDCTL_API=2;export ETCDCTL_CERT_FILE=‘/etc/ssl/etcd/ssl/admin-node2.pem’;export ETCDCTL_KEY_FILE=‘/etc/ssl/etcd/ssl/admin-node2-key.pem’;export ETCDCTL_CA_FILE=‘/etc/ssl/etcd/ssl/ca.pem’;/usr/local/bin/etcdctl –endpoints=https://172.30.13.248:2379,https://172.30.13.249:2379,https://172.30.13.250:2379,https://172.30.14.10:2379,https://172.30.13.85:2379,https://172.30.14.84:2379,https://172.30.14.86:2379 cluster-health | grep -q ‘cluster is healthy’”

: Process exited with status 1

感觉这里都没人了

  • CauchyK零SK壹S

这是添加了很多 etcd 节点么?可以看看新加的etcd节点上 etcd 的运行状态,或者重启下新加的节点上的 etcd 服务

    NDzuki 第八行和第九行的具体panic日志是啥

      Cauchy 在etcd服务有监听的节点上报告etcdctl操作都失败了:ETCDCTL_API=3 etcdctl –endpoints=127.0.0.1:2379 put testkey testvalue

      {“level”:“warn”,“ts”:“2025-05-19T11:33:15.404281+0800”,“logger”:“etcd-client”,“caller”:“v3@v3.5.13/retry_interceptor.go:62”,“msg”:“retrying of unary invoker failed”,“target”:“etcd-endpoints://0xc00001e000/127.0.0.1:2379”,“attempt”:0,“error”:"rpc error: code = DeadlineExceeded desc = latest balancer error: last connection error: connection error: desc = \“error reading server preface: read tcp 127.0.0.1:43724->127.0.0.1:2379: read: connection reset by peer\”"}

      Error: context deadline exceeded

      redscholar May 19 11:42:38 test-node1 etcd[28343]: {"level":"info","ts":"2025-05-19T11:42:38.138128+0800","logger":"raft","caller":"etcdserver/zap_raft.go:77","msg":"58bd702cacc1a321 became follower at term 2"}

      May 19 11:42:38 test-node1 etcd[28343]: {"level":"panic","ts":"2025-05-19T11:42:38.138137+0800","logger":"raft","caller":"etcdserver/zap_raft.go:101","msg":"tocommit(6216540) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.(*zapRaftLogger).Panicf\n\tgo.etcd.io/etcd/server/v3/etcdserver/zap_raft.go:101\ngo.etcd.io/etcd/raft/v3.(*raftLog).commitTo\n\tgo.etcd.io/etcd/raft/v3@v3.5.13/log.go:237\ngo.etcd.io/etcd/raft/v3.(*raft).handleHeartbeat\n\tgo.etcd.io/etcd/raft/v3@v3.5.13/raft.go:1508\ngo.etcd.io/etcd/raft/v3.stepFollower\n\tgo.etcd.io/etcd/raft/v3@v3.5.13/raft.go:1434\ngo.etcd.io/etcd/raft/v3.(*raft).Step\n\tgo.etcd.io/etcd/raft/v3@v3.5.13/raft.go:975\ngo.etcd.io/etcd/raft/v3.(*node).run\n\tgo.etcd.io/etcd/raft/v3@v3.5.13/node.go:356"}

      May 19 11:42:38 test-node1 etcd[28343]: panic: tocommit(6216540) is out of range [lastIndex(0)]. Was the raft log corrupted, truncated, or lost?

      已解决:

      1.通过etcdctl 把etcd问题节点移除: etcdctl member remove member_id

      2.在问题etcd节点清除 /var/lib/etcd/member 目录清除

      3.重新添加问题节点的endpoint: etcdctl member add test-node1 endpoints=<endpoint_url>

      4. 问题节点上重新启动etcd: systemctl start etcd

      5. 通过配置添加节点,etcd节点不再新增,只添加工作节点,结果成功了。

      疑问点是etcd官方说明8,9个节点都可以,但我的集群目前只是7个etcd节点,通过kk添加就出现这个问题了,现在添加不了就跳过吧。