使用 KubeKey 内置 HAproxy 创建高可用集群，部署失败

agt0x7b

操作系统信息
Vmware 虚拟机，Debian10.13，2C/4G

Kubernetes版本信息

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.10", GitCommit:"7e54d50d3012cf3389e43b096ba35300f36e0817", GitTreeState:"clean", BuildDate:"2022-08-17T18:32:54Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.10", GitCommit:"7e54d50d3012cf3389e43b096ba35300f36e0817", GitTreeState:"clean", BuildDate:"2022-08-17T18:26:59Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}

容器运行时

Client: Docker Engine - Community
 Version:           20.10.21
 API version:       1.41
 Go version:        go1.18.7
 Git commit:        baeda1f
 Built:             Tue Oct 25 18:02:28 2022
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.21
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.18.7
  Git commit:       3056208
  Built:            Tue Oct 25 18:00:18 2022
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.6.9
  GitCommit:        1c90a442489720eec95342e1789ee8a5e1b9536f
 runc:
  Version:          1.1.4
  GitCommit:        v1.1.4-0-g5fd4c4d
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

KubeSphere版本信息
v3.3.1。在线安装。使用kk安装。Kubekey 版本为 v3.0.0。

问题是什么
参照使用 KubeKey 内置 HAproxy 创建高可用集群部署集群（3 主节点 3 工作节点），执行 kk 命令过程中出现报错。

同样的虚拟机环境，参照在 Linux 上以 All-in-One 模式安装 KubeSphere 可以成功部署集群。

部署执行命令：

./kk create --debug cluster -f config-sample.yaml

kk 环境检查结果：

+----------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+
| name     | sudo | curl | openssl | ebtables | socat | ipset | ipvsadm | conntrack | chrony | docker   | containerd | nfs client | ceph client | glusterfs client | time         |
+----------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+
| debian31 | y    | y    | y       | y        | y     |       |         | y         |        | 20.10.21 | 1.6.9      |            |             |                  | CST 17:10:13 |
| debian32 | y    | y    | y       | y        | y     |       |         | y         |        | 20.10.21 | 1.6.9      |            |             |                  | CST 17:10:13 |
| debian33 | y    | y    | y       | y        | y     |       |         | y         |        | 20.10.21 | 1.6.9      |            |             |                  | CST 17:10:13 |
| debian34 | y    | y    | y       | y        | y     |       |         | y         |        | 20.10.21 | 1.6.9      |            |             |                  | CST 17:10:13 |
| debian35 | y    | y    | y       | y        | y     |       |         | y         |        | 20.10.21 | 1.6.9      |            |             |                  | CST 17:10:13 |
| debian36 | y    | y    | y       | y        | y     |       |         | y         |        | 20.10.21 | 1.6.9      |            |             |                  | CST 17:10:13 |
+----------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+

配置文件 config-sample.yaml 内容：

apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  - {name: debian31, address: 172.16.240.31, internalAddress: 172.16.240.31, user: root}
  - {name: debian32, address: 172.16.240.32, internalAddress: 172.16.240.32, user: root}
  - {name: debian33, address: 172.16.240.33, internalAddress: 172.16.240.33, user: root}
  - {name: debian34, address: 172.16.240.34, internalAddress: 172.16.240.34, user: root}
  - {name: debian35, address: 172.16.240.35, internalAddress: 172.16.240.35, user: root}
  - {name: debian36, address: 172.16.240.36, internalAddress: 172.16.240.36, user: root}
  roleGroups:
    etcd:
    - debian31
    - debian32
    - debian33
    control-plane:
    - debian31
    - debian32
    - debian33
    worker:
    - debian34
    - debian35
    - debian36
  controlPlaneEndpoint:
    ## Internal loadbalancer for apiservers
    internalLoadbalancer: haproxy

    domain: lb.kubesphere.local
    address: ""
    port: 6443
  kubernetes:
    version: v1.23.10
    clusterName: cluster.local
    autoRenewCerts: true
    containerManager: docker
  etcd:
    type: kubekey
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
    ## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
    multusCNI:
      enabled: false
  registry:
    privateRegistry: ""
    namespaceOverride: ""
    registryMirrors: []
    insecureRegistries: []
  addons: []



---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
  name: ks-installer
  namespace: kubesphere-system
  labels:
    version: v3.3.1
spec:
  persistence:
    storageClass: ""
  authentication:
    jwtSecret: ""
  zone: ""
  local_registry: ""
  namespace_override: ""
  # dev_tag: ""
  etcd:
    monitoring: false
    endpointIps: localhost
    port: 2379
    tlsEnable: true
  common:
    core:
      console:
        enableMultiLogin: true
        port: 30880
        type: NodePort
    # apiserver:
    #  resources: {}
    # controllerManager:
    #  resources: {}
    redis:
      enabled: false
      volumeSize: 2Gi
    openldap:
      enabled: false
      volumeSize: 2Gi
    minio:
      volumeSize: 20Gi
    monitoring:
      # type: external
      endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
      GPUMonitoring:
        enabled: false
    gpu:
      kinds:
      - resourceName: "nvidia.com/gpu"
        resourceType: "GPU"
        default: true
    es:
      # master:
      #   volumeSize: 4Gi
      #   replicas: 1
      #   resources: {}
      # data:
      #   volumeSize: 20Gi
      #   replicas: 1
      #   resources: {}
      logMaxAge: 7
      elkPrefix: logstash
      basicAuth:
        enabled: false
        username: ""
        password: ""
      externalElasticsearchHost: ""
      externalElasticsearchPort: ""
  alerting:
    enabled: false
    # thanosruler:
    #   replicas: 1
    #   resources: {}
  auditing:
    enabled: false
    # operator:
    #   resources: {}
    # webhook:
    #   resources: {}
  devops:
    enabled: false
    # resources: {}
    jenkinsMemoryLim: 8Gi
    jenkinsMemoryReq: 4Gi
    jenkinsVolumeSize: 8Gi
  events:
    enabled: false
    # operator:
    #   resources: {}
    # exporter:
    #   resources: {}
    # ruler:
    #   enabled: true
    #   replicas: 2
    #   resources: {}
  logging:
    enabled: false
    logsidecar:
      enabled: true
      replicas: 2
      # resources: {}
  metrics_server:
    enabled: false
  monitoring:
    storageClass: ""
    node_exporter:
      port: 9100
      # resources: {}
    # kube_rbac_proxy:
    #   resources: {}
    # kube_state_metrics:
    #   resources: {}
    # prometheus:
    #   replicas: 1
    #   volumeSize: 20Gi
    #   resources: {}
    #   operator:
    #     resources: {}
    # alertmanager:
    #   replicas: 1
    #   resources: {}
    # notification_manager:
    #   resources: {}
    #   operator:
    #     resources: {}
    #   proxy:
    #     resources: {}
    gpu:
      nvidia_dcgm_exporter:
        enabled: false
        # resources: {}
  multicluster:
    clusterRole: none
  network:
    networkpolicy:
      enabled: false
    ippool:
      type: none
    topology:
      type: none
  openpitrix:
    store:
      enabled: false
  servicemesh:
    enabled: false
    istio:
      components:
        ingressGateways:
        - name: istio-ingressgateway
          enabled: false
        cni:
          enabled: false
  edgeruntime:
    enabled: false
    kubeedge:
      enabled: false
      cloudCore:
        cloudHub:
          advertiseAddress:
            - ""
        service:
          cloudhubNodePort: "30000"
          cloudhubQuicNodePort: "30001"
          cloudhubHttpsNodePort: "30002"
          cloudstreamNodePort: "30003"
          tunnelNodePort: "30004"
        # resources: {}
        # hostNetWork: false
      iptables-manager:
        enabled: true
        mode: "external"
        # resources: {}
      # edgeService:
      #   resources: {}
  terminal:
    timeout: 600

kk 报错：

## 两种报错交替出现

16:55:59 CST stderr: [debian31]
Failed to exec command: sudo -E /bin/bash -c "/usr/local/bin/kubectl exec -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- ls /kubesphere/playbooks/kubesphere_running"
Please wait for the installation to complete:   >>--->
16:56:00 CST command: [debian31]
sudo -E /bin/bash -c "/usr/local/bin/kubectl exec -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- ls /kubesphere/playbooks/kubesphere_running"
16:56:00 CST stdout: [debian31]
error: unable to upgrade connection: container not found ("installer")
...
...
...
16:56:41 CST stderr: [debian31]
Failed to exec command: sudo -E /bin/bash -c "/usr/local/bin/kubectl exec -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- ls /kubesphere/playbooks/kubesphere_running"
ls: /kubesphere/playbooks/kubesphere_running: No such file or directory
Please wait for the installation to complete:  >>--->
16:56:42 CST command: [debian31]
sudo -E /bin/bash -c "/usr/local/bin/kubectl exec -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- ls /kubesphere/playbooks/kubesphere_running"
16:56:42 CST stdout: [debian31]
ls: /kubesphere/playbooks/kubesphere_running: No such file or directory
command terminated with exit code 1

通过排查发现，openebs-localpv-provisioner 和 ks-installer 两个 pod 一直处于 CrashLoopBackOff 状态：

# kubectl get po -A
NAMESPACE           NAME                                           READY   STATUS             RESTARTS         AGE
kube-system         calico-kube-controllers-676c86494f-pcctx       1/1     Running            1 (34m ago)      35m
kube-system         calico-node-4crhm                              1/1     Running            0                35m
kube-system         calico-node-5tbvl                              1/1     Running            0                35m
kube-system         calico-node-h4sz6                              1/1     Running            0                35m
kube-system         calico-node-k7nvz                              1/1     Running            0                35m
kube-system         calico-node-q757r                              1/1     Running            0                35m
kube-system         calico-node-sls65                              1/1     Running            0                35m
kube-system         coredns-757cd945b-qnrqm                        1/1     Running            0                35m
kube-system         coredns-757cd945b-vfbcw                        1/1     Running            0                35m
kube-system         haproxy-debian34                               1/1     Running            0                35m
kube-system         haproxy-debian35                               1/1     Running            0                35m
kube-system         haproxy-debian36                               1/1     Running            0                35m
kube-system         kube-apiserver-debian31                        1/1     Running            0                35m
kube-system         kube-apiserver-debian32                        1/1     Running            0                35m
kube-system         kube-apiserver-debian33                        1/1     Running            0                35m
kube-system         kube-controller-manager-debian31               1/1     Running            0                35m
kube-system         kube-controller-manager-debian32               1/1     Running            0                35m
kube-system         kube-controller-manager-debian33               1/1     Running            0                35m
kube-system         kube-proxy-bbnll                               1/1     Running            0                35m
kube-system         kube-proxy-cx9wb                               1/1     Running            0                35m
kube-system         kube-proxy-pncvj                               1/1     Running            0                35m
kube-system         kube-proxy-qp8ln                               1/1     Running            0                35m
kube-system         kube-proxy-tlg2k                               1/1     Running            0                35m
kube-system         kube-proxy-x45nm                               1/1     Running            0                35m
kube-system         kube-scheduler-debian31                        1/1     Running            0                35m
kube-system         kube-scheduler-debian32                        1/1     Running            0                35m
kube-system         kube-scheduler-debian33                        1/1     Running            0                35m
kube-system         nodelocaldns-4bqbb                             1/1     Running            0                35m
kube-system         nodelocaldns-6fdhm                             1/1     Running            0                35m
kube-system         nodelocaldns-6lr79                             1/1     Running            0                35m
kube-system         nodelocaldns-dcqvk                             1/1     Running            0                35m
kube-system         nodelocaldns-jxbhn                             1/1     Running            0                35m
kube-system         nodelocaldns-qzttp                             1/1     Running            0                35m
kube-system         openebs-localpv-provisioner-7974b86588-pz2wt   0/1     CrashLoopBackOff   10 (2m4s ago)    35m
kubesphere-system   ks-installer-87bbff65c-zslvl                   0/1     CrashLoopBackOff   10 (2m16s ago)   35m

# kubectl describe -n kube-system po openebs-localpv-provisioner-7974b86588-pz2wt
...
...
Events:
  Type     Reason            Age                  From               Message
  ----     ------            ----                 ----               -------
  Warning  FailedScheduling  40m                  default-scheduler  0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled         40m                  default-scheduler  Successfully assigned kube-system/openebs-localpv-provisioner-7974b86588-pz2wt to debian35
  Normal   Pulled            39m                  kubelet            Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0" in 56.016483655s
  Normal   Pulled            39m                  kubelet            Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0" in 418.848719ms
  Normal   Pulled            38m                  kubelet            Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0" in 397.543538ms
  Normal   Created           37m (x4 over 39m)    kubelet            Created container openebs-provisioner-hostpath
  Normal   Started           37m (x4 over 39m)    kubelet            Started container openebs-provisioner-hostpath
  Normal   Pulled            37m                  kubelet            Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0" in 444.881408ms
  Normal   Pulling           36m (x5 over 40m)    kubelet            Pulling image "registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0"
  Warning  BackOff           27s (x157 over 38m)  kubelet            Back-off restarting failed container

# kubectl logs -n kube-system openebs-localpv-provisioner-7974b86588-pz2wt
I1116 09:07:48.252819       1 start.go:66] Starting Provisioner...
failure in preupgrade tasks: failed to list localpv based pv(s): Get "https://10.233.0.1:443/api/v1/persistentvolumes?labelSelector=openebs.io%2Fcas-type%3Dlocal-device": dial tcp 10.233.0.1:443: i/o timeout

# kubectl describe -n kubesphere-system po ks-installer-87bbff65c-zslvl
...
...
Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  41m                   default-scheduler  0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
  Normal   Scheduled         41m                   default-scheduler  Successfully assigned kubesphere-system/ks-installer-87bbff65c-zslvl to debian35
  Normal   Pulled            41m                   kubelet            Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1" in 37.590308053s
  Normal   Pulled            40m                   kubelet            Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1" in 488.505838ms
  Normal   Pulled            39m                   kubelet            Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1" in 477.61114ms
  Normal   Pulled            38m                   kubelet            Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1" in 404.311555ms
  Normal   Created           38m (x4 over 41m)     kubelet            Created container installer
  Normal   Started           38m (x4 over 41m)     kubelet            Started container installer
  Normal   Pulled            37m                   kubelet            Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1" in 410.703986ms
  Normal   Pulling           26m (x8 over 41m)     kubelet            Pulling image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1"
  Warning  BackOff           107s (x151 over 40m)  kubelet            Back-off restarting failed container

# kubectl logs -n kubesphere-system ks-installer-87bbff65c-zslvl
2022-11-16T17:02:02+08:00 INFO     : shell-operator latest
2022-11-16T17:02:02+08:00 INFO     : HTTP SERVER Listening on 0.0.0.0:9115
2022-11-16T17:02:02+08:00 INFO     : Use temporary dir: /tmp/shell-operator
2022-11-16T17:02:02+08:00 INFO     : Initialize hooks manager ...
2022-11-16T17:02:02+08:00 INFO     : Search and load hooks ...
2022-11-16T17:02:02+08:00 INFO     : Load hook config from '/hooks/kubesphere/installRunner.py'
2022-11-16T17:02:03+08:00 INFO     : Load hook config from '/hooks/kubesphere/schedule.sh'
2022-11-16T17:02:03+08:00 INFO     : Initializing schedule manager ...
2022-11-16T17:02:03+08:00 INFO     : KUBE Init Kubernetes client
2022-11-16T17:02:03+08:00 INFO     : KUBE-INIT Kubernetes client is configured successfully
2022-11-16T17:02:03+08:00 INFO     : MAIN: run main loop
2022-11-16T17:02:03+08:00 INFO     : MAIN: add onStartup tasks
2022-11-16T17:02:03+08:00 INFO     : QUEUE add all HookRun@OnStartup
2022-11-16T17:02:03+08:00 INFO     : Running schedule manager ...
2022-11-16T17:02:03+08:00 INFO     : MSTOR Create new metric shell_operator_live_ticks
2022-11-16T17:02:03+08:00 INFO     : MSTOR Create new metric shell_operator_tasks_queue_length
2022-11-16T17:02:33+08:00 ERROR    : error getting GVR for kind 'ClusterConfiguration': Get "https://10.233.0.1:443/api?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout
2022-11-16T17:02:33+08:00 ERROR    : Enable kube events for hooks error: Get "https://10.233.0.1:443/api?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout
2022-11-16T17:02:33+08:00 INFO     : TASK_RUN Exit: program halts.

经过多次部署尝试（虚拟机均恢复至清洁状态），均在此步骤出现同样报错。

24sama

agt0x7b
把这两个报错的pod删了试试

agt0x7b

24sama

删掉以后，两个 pod 自动重建，会在极短时间内均进入 Running 状态，随后又全部 CrashLoopBackOff，kk 报错照旧。

nekomeowww

我也遇到了这个问题，我哪怕是参照 在 Linux 上以 All-in-One 模式安装 KubeSphere 指南进行单节点安装还是多节点安装都不行（我在多节点安装的时候没有打开 yaml 配置文件中的 spec.controlPlaneEndpoint.internalLoadbalancer ），最后都会卡在这里：

16:49:40 CST command: [node1]
sudo -E /bin/bash -c "/usr/local/bin/kubectl exec -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- ls /kubesphere/playbooks/kubesphere_running"
16:49:40 CST stdout: [node1]
ls: /kubesphere/playbooks/kubesphere_running: No such file or directory

使用 kubectl logs -n <pod-name> 命令对 pod 的日志进行检查的时候也能发现以下两行错误：

2022-11-16T17:02:33+08:00 ERROR    : error getting GVR for kind 'ClusterConfiguration': Get "https://10.233.0.1:443/api?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout
2022-11-16T17:02:33+08:00 ERROR    : Enable kube events for hooks error: Get "https://10.233.0.1:443/api?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout

我反复检查了单节点自己的网络状态和集群内的通信状况，发现单节点自己能 ping 通自己、集群部署的时候节点自己也都能 ping 通和访问到，但我不清楚 10.233.0.1:443 指向的是哪一个服务，所以也没办法了解到具体是哪一步出现了导致现在状况的问题。

maozk

我也是这样的问题