• 安装部署
  • All-in-One安装时候etcd的2379一直被拒绝

之前安装一直是在检查etcd健康时候报错,后来有一次这里过去了,restart etcd也过去了,最后reload etcd时候2379又被拒绝了

`TASK [etcd : Configure | Check if etcd cluster is healthy] ****************************************************************************
Monday 16 December 2019 14:43:42 +0800 (0:00:00.197) 0:04:11.017 *******
FAILED - RETRYING: Configure | Check if etcd cluster is healthy (4 retries left).
FAILED - RETRYING: Configure | Check if etcd cluster is healthy (3 retries left).
FAILED - RETRYING: Configure | Check if etcd cluster is healthy (2 retries left).
FAILED - RETRYING: Configure | Check if etcd cluster is healthy (1 retries left).
fatal: [ks-allinone -> ks-allinone]: FAILED! => {
“attempts”: 4,
“changed”: false,
“cmd”: “/usr/local/bin/etcdctl –no-sync –endpoints=https://172.18.248.238:2379 cluster-health | grep -q ‘cluster is healthy’”,
“delta”: “0:00:00.240749″,
“end”: “2019-12-16 14:43:57.047583″,
“rc”: 1,
“start”: “2019-12-16 14:43:56.806834”
}

STDERR:

Error: client: etcd cluster is unavailable or misconfigured; error #0: EOF

error #0: EOF

MSG:

non-zero return code`

[root@ks-allinone etcd]# etcdctl cluster-health
member 8e9e05c52164694d is healthy: got healthy result from http://localhost:2379
cluster is healthy

    Forest-L
    重新打开的窗口确保防火墙已开闭,是用什么机器安装的呢?

    Forest-L 卸载安装好几次了,现在的问题是,etcdctl cluster-health结果是“cluster is health”,但是平台安装验证etcd是否健康,报错说不健康,报错信息就是上面那些,只有EOF

      Forest-L 大佬,上午那个image一直下载不下来,我从另一个机器导进来了,但是安装程序还是卡在那里不继续,需要重新安装么?
      [root@ks-allinone logs]# docker images | grep cni
      calico/cni v3.7.3 1a6ade52d471 6 months ago 135MB

      TASK [download : download_container | Download image if required ( calico/cni:v3.7.3 )] ***********************************************
      Tuesday 17 December 2019 11:46:10 +0800 (0:00:00.219) 0:22:46.205 ******
      FAILED - RETRYING: download_container | Download image if required ( calico/cni:v3.7.3 ) (4 retries left).

        Forest-L 上午辛苦大佬了,重新安装后,还是显示2379被拒绝,可能是这个机器太乱了,我已经让公司服务器人重新做一个机器了,等新机器做好,我再试试,这个平台的单机模式对机器有什么特别要求么,比如单独挂载大小什么的,目前是8和,32G内存,80G磁盘

          Forest-L [root@ks-allinone ~]# df -hT
          Filesystem Type Size Used Avail Use% Mounted on
          /dev/mapper/centos-root ext4 127G 5.4G 116G 5% /
          devtmpfs devtmpfs 16G 0 16G 0% /dev
          tmpfs tmpfs 16G 264K 16G 1% /dev/shm
          tmpfs tmpfs 16G 8.9M 16G 1% /run
          tmpfs tmpfs 16G 0 16G 0% /sys/fs/cgroup
          /dev/sda1 ext4 190M 147M 30M 84% /boot
          /dev/mapper/centos-home ext4 5.6G 24M 5.3G 1% /home
          tmpfs tmpfs 3.2G 12K 3.2G 1% /run/user/0

          我磁盘配好了,根目录被我扩大了,现在在安装,还是docker的资源卡在那,不过我看磁盘空间在半小时里加了2M,可能下载的很慢,谢谢大佬指导

          1 年 后

          Forest-L 做了这步操作,然后我的etcd 怎么也起不来了怎么办?

          systemctl stop etcd.service && systemctl disable etcd.service && rm /var/lib/etcd -rf

          Forest-L 报销信息:

          ......##### 省略
          [master3] Downloading image: harbor.dockerregistry.local/calico/node:v3.15.1
          [master1] Downloading image: harbor.dockerregistry.local/calico/node:v3.15.1
          [master3] Downloading image: harbor.dockerregistry.local/calico/pod2daemon-flexvol:v3.15.1
          [master2] Downloading image: harbor.dockerregistry.local/calico/pod2daemon-flexvol:v3.15.1
          [master1] Downloading image: harbor.dockerregistry.local/calico/pod2daemon-flexvol:v3.15.1
          INFO[22:33:43 CST] Generating etcd certs
          INFO[22:33:49 CST] Synchronizing etcd certs
          INFO[22:33:49 CST] Creating etcd service
          [master1 10.3.1.16] MSG:
          Created symlink from /etc/systemd/system/multi-user.target.wants/etcd.service to /etc/systemd/system/etcd.service.
          [master2 10.3.1.17] MSG:
          Created symlink from /etc/systemd/system/multi-user.target.wants/etcd.service to /etc/systemd/system/etcd.service.
          [master3 10.3.1.18] MSG:
          Created symlink from /etc/systemd/system/multi-user.target.wants/etcd.service to /etc/systemd/system/etcd.service.
          INFO[22:33:54 CST] Starting etcd cluster
          [master1 10.3.1.16] MSG:
          Configuration file already exists
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          2 个月 后

          all-in-one单机台机器启动
          `[centos8] Downloading image: calico/pod2daemon-flexvol:v3.16.3
          INFO[15:08:55 HKT] Generating etcd certs
          INFO[15:08:57 HKT] Synchronizing etcd certs
          INFO[15:08:57 HKT] Creating etcd service
          [centos8 192.168.31.79] MSG:
          etcd already exists
          INFO[15:09:19 HKT] Starting etcd cluster
          [centos8 192.168.31.79] MSG:
          Configuration file already exists
          [centos8 192.168.31.79] MSG:
          v3.4.13
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          Waiting for etcd to start
          WARN[15:10:57 HKT] Task failed …
          WARN[15:10:57 HKT] error: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c “export ETCDCTL_API=2;export ETCDCTL_CERT_FILE=‘/etc/ssl/etcd/ssl/admin-centos8.pem’;export ETCDCTL_KEY_FILE=‘/etc/ssl/etcd/ssl/admin-centos8-key.pem’;export ETCDCTL_CA_FILE=‘/etc/ssl/etcd/ssl/ca.pem’;/usr/local/bin/etcdctl –endpoints=https://192.168.31.79:2379 cluster-health | grep -q ‘cluster is healthy’”
          Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.31.79:2379: connect: connection refused

          error #0: dial tcp 192.168.31.79:2379: connect: connection refused: Process exited with status 1
          Error: Failed to start etcd cluster: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c “export ETCDCTL_API=2;export ETCDCTL_CERT_FILE=‘/etc/ssl/etcd/ssl/admin-centos8.pem’;export ETCDCTL_KEY_FILE=‘/etc/ssl/etcd/ssl/admin-centos8-key.pem’;export ETCDCTL_CA_FILE=‘/etc/ssl/etcd/ssl/ca.pem’;/usr/local/bin/etcdctl –endpoints=https://192.168.31.79:2379 cluster-health | grep -q ‘cluster is healthy’”
          Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.31.79:2379: connect: connection refused

          error #0: dial tcp 192.168.31.79:2379: connect: connection refused: Process exited with status 1
          Usage:
          kk create cluster [flags]

          Flags:
          –download-cmd string The user defined command to download the necessary binary files. The first param ‘%s’ is output path, the second param ‘%s’, is the URL (default “curl -L -o %s %s”)
          -f, –filename string Path to a configuration file
          -h, –help help for cluster
          –skip-pull-images Skip pre pull images
          –with-kubernetes string Specify a supported version of kubernetes (default “v1.19.8”)
          –with-kubesphere Deploy a specific version of kubesphere (default v3.1.0)
          –with-local-storage Deploy a local PV provisioner
          -y, –yes Skip pre-check of the installation

          Global Flags:
          –debug Print detailed information (default true)
          –in-cluster Running inside the cluster

          Failed to start etcd cluster: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c “export ETCDCTL_API=2;export ETCDCTL_CERT_FILE=‘/etc/ssl/etcd/ssl/admin-centos8.pem’;export ETCDCTL_KEY_FILE=‘/etc/ssl/etcd/ssl/admin-centos8-key.pem’;export ETCDCTL_CA_FILE=‘/etc/ssl/etcd/ssl/ca.pem’;/usr/local/bin/etcdctl –endpoints=https://192.168.31.79:2379 cluster-health | grep -q ‘cluster is healthy’”
          Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 192.168.31.79:2379: connect: connection refused

          error #0: dial tcp 192.168.31.79:2379: connect: connection refused: Process exited with status 1`

            13 天 后

            warn 可以试试执行./kk delete cluster 把环境清干净,再重新安装集群