操作系统信息
Vmware 虚拟机,Debian10.13,2C/4G
Kubernetes版本信息
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.10", GitCommit:"7e54d50d3012cf3389e43b096ba35300f36e0817", GitTreeState:"clean", BuildDate:"2022-08-17T18:32:54Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.10", GitCommit:"7e54d50d3012cf3389e43b096ba35300f36e0817", GitTreeState:"clean", BuildDate:"2022-08-17T18:26:59Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
容器运行时
Client: Docker Engine - Community
Version: 20.10.21
API version: 1.41
Go version: go1.18.7
Git commit: baeda1f
Built: Tue Oct 25 18:02:28 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.21
API version: 1.41 (minimum version 1.12)
Go version: go1.18.7
Git commit: 3056208
Built: Tue Oct 25 18:00:18 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.9
GitCommit: 1c90a442489720eec95342e1789ee8a5e1b9536f
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
KubeSphere版本信息
v3.3.1。在线安装。使用kk安装。Kubekey 版本为 v3.0.0。
问题是什么
参照 使用 KubeKey 内置 HAproxy 创建高可用集群 部署集群(3 主节点 3 工作节点),执行 kk 命令过程中出现报错。
部署执行命令:
./kk create --debug cluster -f config-sample.yaml
kk 环境检查结果:
+----------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+
| name | sudo | curl | openssl | ebtables | socat | ipset | ipvsadm | conntrack | chrony | docker | containerd | nfs client | ceph client | glusterfs client | time |
+----------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+
| debian31 | y | y | y | y | y | | | y | | 20.10.21 | 1.6.9 | | | | CST 17:10:13 |
| debian32 | y | y | y | y | y | | | y | | 20.10.21 | 1.6.9 | | | | CST 17:10:13 |
| debian33 | y | y | y | y | y | | | y | | 20.10.21 | 1.6.9 | | | | CST 17:10:13 |
| debian34 | y | y | y | y | y | | | y | | 20.10.21 | 1.6.9 | | | | CST 17:10:13 |
| debian35 | y | y | y | y | y | | | y | | 20.10.21 | 1.6.9 | | | | CST 17:10:13 |
| debian36 | y | y | y | y | y | | | y | | 20.10.21 | 1.6.9 | | | | CST 17:10:13 |
+----------+------+------+---------+----------+-------+-------+---------+-----------+--------+----------+------------+------------+-------------+------------------+--------------+
配置文件 config-sample.yaml 内容:
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: debian31, address: 172.16.240.31, internalAddress: 172.16.240.31, user: root}
- {name: debian32, address: 172.16.240.32, internalAddress: 172.16.240.32, user: root}
- {name: debian33, address: 172.16.240.33, internalAddress: 172.16.240.33, user: root}
- {name: debian34, address: 172.16.240.34, internalAddress: 172.16.240.34, user: root}
- {name: debian35, address: 172.16.240.35, internalAddress: 172.16.240.35, user: root}
- {name: debian36, address: 172.16.240.36, internalAddress: 172.16.240.36, user: root}
roleGroups:
etcd:
- debian31
- debian32
- debian33
control-plane:
- debian31
- debian32
- debian33
worker:
- debian34
- debian35
- debian36
controlPlaneEndpoint:
## Internal loadbalancer for apiservers
internalLoadbalancer: haproxy
domain: lb.kubesphere.local
address: ""
port: 6443
kubernetes:
version: v1.23.10
clusterName: cluster.local
autoRenewCerts: true
containerManager: docker
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
## multus support. https://github.com/k8snetworkplumbingwg/multus-cni
multusCNI:
enabled: false
registry:
privateRegistry: ""
namespaceOverride: ""
registryMirrors: []
insecureRegistries: []
addons: []
---
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.3.1
spec:
persistence:
storageClass: ""
authentication:
jwtSecret: ""
zone: ""
local_registry: ""
namespace_override: ""
# dev_tag: ""
etcd:
monitoring: false
endpointIps: localhost
port: 2379
tlsEnable: true
common:
core:
console:
enableMultiLogin: true
port: 30880
type: NodePort
# apiserver:
# resources: {}
# controllerManager:
# resources: {}
redis:
enabled: false
volumeSize: 2Gi
openldap:
enabled: false
volumeSize: 2Gi
minio:
volumeSize: 20Gi
monitoring:
# type: external
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
GPUMonitoring:
enabled: false
gpu:
kinds:
- resourceName: "nvidia.com/gpu"
resourceType: "GPU"
default: true
es:
# master:
# volumeSize: 4Gi
# replicas: 1
# resources: {}
# data:
# volumeSize: 20Gi
# replicas: 1
# resources: {}
logMaxAge: 7
elkPrefix: logstash
basicAuth:
enabled: false
username: ""
password: ""
externalElasticsearchHost: ""
externalElasticsearchPort: ""
alerting:
enabled: false
# thanosruler:
# replicas: 1
# resources: {}
auditing:
enabled: false
# operator:
# resources: {}
# webhook:
# resources: {}
devops:
enabled: false
# resources: {}
jenkinsMemoryLim: 8Gi
jenkinsMemoryReq: 4Gi
jenkinsVolumeSize: 8Gi
events:
enabled: false
# operator:
# resources: {}
# exporter:
# resources: {}
# ruler:
# enabled: true
# replicas: 2
# resources: {}
logging:
enabled: false
logsidecar:
enabled: true
replicas: 2
# resources: {}
metrics_server:
enabled: false
monitoring:
storageClass: ""
node_exporter:
port: 9100
# resources: {}
# kube_rbac_proxy:
# resources: {}
# kube_state_metrics:
# resources: {}
# prometheus:
# replicas: 1
# volumeSize: 20Gi
# resources: {}
# operator:
# resources: {}
# alertmanager:
# replicas: 1
# resources: {}
# notification_manager:
# resources: {}
# operator:
# resources: {}
# proxy:
# resources: {}
gpu:
nvidia_dcgm_exporter:
enabled: false
# resources: {}
multicluster:
clusterRole: none
network:
networkpolicy:
enabled: false
ippool:
type: none
topology:
type: none
openpitrix:
store:
enabled: false
servicemesh:
enabled: false
istio:
components:
ingressGateways:
- name: istio-ingressgateway
enabled: false
cni:
enabled: false
edgeruntime:
enabled: false
kubeedge:
enabled: false
cloudCore:
cloudHub:
advertiseAddress:
- ""
service:
cloudhubNodePort: "30000"
cloudhubQuicNodePort: "30001"
cloudhubHttpsNodePort: "30002"
cloudstreamNodePort: "30003"
tunnelNodePort: "30004"
# resources: {}
# hostNetWork: false
iptables-manager:
enabled: true
mode: "external"
# resources: {}
# edgeService:
# resources: {}
terminal:
timeout: 600
kk 报错:
## 两种报错交替出现
16:55:59 CST stderr: [debian31]
Failed to exec command: sudo -E /bin/bash -c "/usr/local/bin/kubectl exec -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- ls /kubesphere/playbooks/kubesphere_running"
Please wait for the installation to complete: >>--->
16:56:00 CST command: [debian31]
sudo -E /bin/bash -c "/usr/local/bin/kubectl exec -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- ls /kubesphere/playbooks/kubesphere_running"
16:56:00 CST stdout: [debian31]
error: unable to upgrade connection: container not found ("installer")
...
...
...
16:56:41 CST stderr: [debian31]
Failed to exec command: sudo -E /bin/bash -c "/usr/local/bin/kubectl exec -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- ls /kubesphere/playbooks/kubesphere_running"
ls: /kubesphere/playbooks/kubesphere_running: No such file or directory
Please wait for the installation to complete: >>--->
16:56:42 CST command: [debian31]
sudo -E /bin/bash -c "/usr/local/bin/kubectl exec -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-installer -o jsonpath='{.items[0].metadata.name}') -- ls /kubesphere/playbooks/kubesphere_running"
16:56:42 CST stdout: [debian31]
ls: /kubesphere/playbooks/kubesphere_running: No such file or directory
command terminated with exit code 1
通过排查发现,openebs-localpv-provisioner
和 ks-installer
两个 pod 一直处于 CrashLoopBackOff
状态:
# kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-676c86494f-pcctx 1/1 Running 1 (34m ago) 35m
kube-system calico-node-4crhm 1/1 Running 0 35m
kube-system calico-node-5tbvl 1/1 Running 0 35m
kube-system calico-node-h4sz6 1/1 Running 0 35m
kube-system calico-node-k7nvz 1/1 Running 0 35m
kube-system calico-node-q757r 1/1 Running 0 35m
kube-system calico-node-sls65 1/1 Running 0 35m
kube-system coredns-757cd945b-qnrqm 1/1 Running 0 35m
kube-system coredns-757cd945b-vfbcw 1/1 Running 0 35m
kube-system haproxy-debian34 1/1 Running 0 35m
kube-system haproxy-debian35 1/1 Running 0 35m
kube-system haproxy-debian36 1/1 Running 0 35m
kube-system kube-apiserver-debian31 1/1 Running 0 35m
kube-system kube-apiserver-debian32 1/1 Running 0 35m
kube-system kube-apiserver-debian33 1/1 Running 0 35m
kube-system kube-controller-manager-debian31 1/1 Running 0 35m
kube-system kube-controller-manager-debian32 1/1 Running 0 35m
kube-system kube-controller-manager-debian33 1/1 Running 0 35m
kube-system kube-proxy-bbnll 1/1 Running 0 35m
kube-system kube-proxy-cx9wb 1/1 Running 0 35m
kube-system kube-proxy-pncvj 1/1 Running 0 35m
kube-system kube-proxy-qp8ln 1/1 Running 0 35m
kube-system kube-proxy-tlg2k 1/1 Running 0 35m
kube-system kube-proxy-x45nm 1/1 Running 0 35m
kube-system kube-scheduler-debian31 1/1 Running 0 35m
kube-system kube-scheduler-debian32 1/1 Running 0 35m
kube-system kube-scheduler-debian33 1/1 Running 0 35m
kube-system nodelocaldns-4bqbb 1/1 Running 0 35m
kube-system nodelocaldns-6fdhm 1/1 Running 0 35m
kube-system nodelocaldns-6lr79 1/1 Running 0 35m
kube-system nodelocaldns-dcqvk 1/1 Running 0 35m
kube-system nodelocaldns-jxbhn 1/1 Running 0 35m
kube-system nodelocaldns-qzttp 1/1 Running 0 35m
kube-system openebs-localpv-provisioner-7974b86588-pz2wt 0/1 CrashLoopBackOff 10 (2m4s ago) 35m
kubesphere-system ks-installer-87bbff65c-zslvl 0/1 CrashLoopBackOff 10 (2m16s ago) 35m
# kubectl describe -n kube-system po openebs-localpv-provisioner-7974b86588-pz2wt
...
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 40m default-scheduler 0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
Normal Scheduled 40m default-scheduler Successfully assigned kube-system/openebs-localpv-provisioner-7974b86588-pz2wt to debian35
Normal Pulled 39m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0" in 56.016483655s
Normal Pulled 39m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0" in 418.848719ms
Normal Pulled 38m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0" in 397.543538ms
Normal Created 37m (x4 over 39m) kubelet Created container openebs-provisioner-hostpath
Normal Started 37m (x4 over 39m) kubelet Started container openebs-provisioner-hostpath
Normal Pulled 37m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0" in 444.881408ms
Normal Pulling 36m (x5 over 40m) kubelet Pulling image "registry.cn-beijing.aliyuncs.com/kubesphereio/provisioner-localpv:3.3.0"
Warning BackOff 27s (x157 over 38m) kubelet Back-off restarting failed container
# kubectl logs -n kube-system openebs-localpv-provisioner-7974b86588-pz2wt
I1116 09:07:48.252819 1 start.go:66] Starting Provisioner...
failure in preupgrade tasks: failed to list localpv based pv(s): Get "https://10.233.0.1:443/api/v1/persistentvolumes?labelSelector=openebs.io%2Fcas-type%3Dlocal-device": dial tcp 10.233.0.1:443: i/o timeout
# kubectl describe -n kubesphere-system po ks-installer-87bbff65c-zslvl
...
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 41m default-scheduler 0/6 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 3 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
Normal Scheduled 41m default-scheduler Successfully assigned kubesphere-system/ks-installer-87bbff65c-zslvl to debian35
Normal Pulled 41m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1" in 37.590308053s
Normal Pulled 40m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1" in 488.505838ms
Normal Pulled 39m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1" in 477.61114ms
Normal Pulled 38m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1" in 404.311555ms
Normal Created 38m (x4 over 41m) kubelet Created container installer
Normal Started 38m (x4 over 41m) kubelet Started container installer
Normal Pulled 37m kubelet Successfully pulled image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1" in 410.703986ms
Normal Pulling 26m (x8 over 41m) kubelet Pulling image "registry.cn-beijing.aliyuncs.com/kubesphereio/ks-installer:v3.3.1"
Warning BackOff 107s (x151 over 40m) kubelet Back-off restarting failed container
# kubectl logs -n kubesphere-system ks-installer-87bbff65c-zslvl
2022-11-16T17:02:02+08:00 INFO : shell-operator latest
2022-11-16T17:02:02+08:00 INFO : HTTP SERVER Listening on 0.0.0.0:9115
2022-11-16T17:02:02+08:00 INFO : Use temporary dir: /tmp/shell-operator
2022-11-16T17:02:02+08:00 INFO : Initialize hooks manager ...
2022-11-16T17:02:02+08:00 INFO : Search and load hooks ...
2022-11-16T17:02:02+08:00 INFO : Load hook config from '/hooks/kubesphere/installRunner.py'
2022-11-16T17:02:03+08:00 INFO : Load hook config from '/hooks/kubesphere/schedule.sh'
2022-11-16T17:02:03+08:00 INFO : Initializing schedule manager ...
2022-11-16T17:02:03+08:00 INFO : KUBE Init Kubernetes client
2022-11-16T17:02:03+08:00 INFO : KUBE-INIT Kubernetes client is configured successfully
2022-11-16T17:02:03+08:00 INFO : MAIN: run main loop
2022-11-16T17:02:03+08:00 INFO : MAIN: add onStartup tasks
2022-11-16T17:02:03+08:00 INFO : QUEUE add all HookRun@OnStartup
2022-11-16T17:02:03+08:00 INFO : Running schedule manager ...
2022-11-16T17:02:03+08:00 INFO : MSTOR Create new metric shell_operator_live_ticks
2022-11-16T17:02:03+08:00 INFO : MSTOR Create new metric shell_operator_tasks_queue_length
2022-11-16T17:02:33+08:00 ERROR : error getting GVR for kind 'ClusterConfiguration': Get "https://10.233.0.1:443/api?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout
2022-11-16T17:02:33+08:00 ERROR : Enable kube events for hooks error: Get "https://10.233.0.1:443/api?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout
2022-11-16T17:02:33+08:00 INFO : TASK_RUN Exit: program halts.
经过多次部署尝试(虚拟机均恢复至清洁状态),均在此步骤出现同样报错。