创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。
发帖前请点击 发表主题 右边的 预览(👀) 按钮,确保帖子格式正确。
操作系统信息
虚拟机,Centos7.6,4C/16G 纯净的系统
Kubernetes版本信息
v1.21.5 多master节点。
容器运行时
kubekey默认安装
KubeSphere版本信息
v3.2.1 在线安装。全套安装。
问题描述
kubekey安装kubesphere高可用集群,负载均衡器使用外置keepalived+nginx(VIP:192.168.142.210:8443)
config-sample.yaml 配置如下:
config-sample.yaml
apiVersion: kubekey.kubesphere.io/v1alpha1
kind: Cluster
metadata:
name: sample
spec:
hosts:
- {name: ks-master1, address: 192.168.142.91, internalAddress: 192.168.142.91, user: root, password: 1234567}
- {name: ks-master2, address: 192.168.142.93, internalAddress: 192.168.142.93, user: root, password: 1234567}
- {name: ks-master3, address: 192.168.142.96, internalAddress: 192.168.142.96, user: root, password: 1234567}
- {name: ks-node1, address: 192.168.142.138, internalAddress: 192.168.142.138, user: root, password: 1234567}
- {name: ks-node2, address: 192.168.142.139, internalAddress: 192.168.142.139, user: root, password: 1234567}
roleGroups:
etcd:
- ks-master1
- ks-master2
- ks-master3
master:
- ks-master1
- ks-master2
- ks-master3
worker:
- ks-node1
- ks-node2
controlPlaneEndpoint:
##Internal loadbalancer for apiservers
#internalLoadbalancer: haproxy
domain: lb.kubesphere.com
address: "192.168.142.210"
port: 8443
kubernetes:
version: v1.21.5
clusterName: dev.kubesphere.com
network:
plugin: calico
kubePodsCIDR: 10.233.64.0/18
kubeServiceCIDR: 10.233.0.0/18
registry:
registryMirrors: []
insecureRegistries: []
addons: []
外置负载均衡主要配置信息:
/etc/nginx/nginx.conf
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 1024;
}
stream {
log_format main '$remote_addr $upstream_addr - [$time_local] $status $upstream_bytes_sent';
access_log /var/log/nginx/k8s-access.log main;
# kubesphere集群负载 #################################
upstream ks-k8s-apiserver {
server 192.168.142.91:6443; # Master1 APISERVER IP:PORT
server 192.168.142.93:6443; # Master2 APISERVER IP:PORT
server 192.168.142.96:6443; # Master2 APISERVER IP:PORT
}
server {
listen 8443;
proxy_pass ks-k8s-apiserver;
}
# 其他业务的两个k8s集群负载 ###########################
upstream k8s-apiserver {
server 192.168.142.213:6443; # Master1 APISERVER IP:PORT
server 192.168.142.214:6443; # Master2 APISERVER IP:PORT
}
server {
listen 6443;
proxy_pass k8s-apiserver;
}
upstream hc-k8s-apiserver {
server 192.168.142.226:6443; # Master1 APISERVER IP:PORT
server 192.168.142.227:6443; # Master2 APISERVER IP:PORT
server 192.168.142.228:6443; # Master2 APISERVER IP:PORT
}
server {
listen 7443;
proxy_pass hc-k8s-apiserver;
}
}
执行如下命令开始安装
./kk create cluster -f config-sample.yaml
之前都很顺利,到了初始化k8s集群时,开始出现问题:
INFO[21:06:35 CST] Backup etcd data regularly
INFO[21:06:42 CST] Installing kube binaries
Push /root/kubekey/v1.21.5/amd64/kubeadm to 192.168.142.139:/tmp/kubekey/kubeadm Done
Push /root/kubekey/v1.21.5/amd64/kubeadm to 192.168.142.91:/tmp/kubekey/kubeadm Done
Push /root/kubekey/v1.21.5/amd64/kubeadm to 192.168.142.93:/tmp/kubekey/kubeadm Done
Push /root/kubekey/v1.21.5/amd64/kubeadm to 192.168.142.96:/tmp/kubekey/kubeadm Done
Push /root/kubekey/v1.21.5/amd64/kubelet to 192.168.142.91:/tmp/kubekey/kubelet Done
Push /root/kubekey/v1.21.5/amd64/kubectl to 192.168.142.91:/tmp/kubekey/kubectl Done
Push /root/kubekey/v1.21.5/amd64/helm to 192.168.142.91:/tmp/kubekey/helm Done
Push /root/kubekey/v1.21.5/amd64/kubelet to 192.168.142.93:/tmp/kubekey/kubelet Done
Push /root/kubekey/v1.21.5/amd64/cni-plugins-linux-amd64-v0.9.1.tgz to 192.168.142.91:/tmp/kubekey/cni-plugins-linux-amd64-v0.9.1.tgz Done
Push /root/kubekey/v1.21.5/amd64/kubelet to 192.168.142.96:/tmp/kubekey/kubelet Done
Push /root/kubekey/v1.21.5/amd64/kubectl to 192.168.142.93:/tmp/kubekey/kubectl Done
Push /root/kubekey/v1.21.5/amd64/kubelet to 192.168.142.139:/tmp/kubekey/kubelet Done
Push /root/kubekey/v1.21.5/amd64/kubectl to 192.168.142.96:/tmp/kubekey/kubectl Done
Push /root/kubekey/v1.21.5/amd64/kubeadm to 192.168.142.138:/tmp/kubekey/kubeadm Done
Push /root/kubekey/v1.21.5/amd64/helm to 192.168.142.93:/tmp/kubekey/helm Done
Push /root/kubekey/v1.21.5/amd64/kubectl to 192.168.142.139:/tmp/kubekey/kubectl Done
Push /root/kubekey/v1.21.5/amd64/helm to 192.168.142.96:/tmp/kubekey/helm Done
Push /root/kubekey/v1.21.5/amd64/cni-plugins-linux-amd64-v0.9.1.tgz to 192.168.142.93:/tmp/kubekey/cni-plugins-linux-amd64-v0.9.1.tgz Done
Push /root/kubekey/v1.21.5/amd64/cni-plugins-linux-amd64-v0.9.1.tgz to 192.168.142.96:/tmp/kubekey/cni-plugins-linux-amd64-v0.9.1.tgz Done
Push /root/kubekey/v1.21.5/amd64/helm to 192.168.142.139:/tmp/kubekey/helm Done
Push /root/kubekey/v1.21.5/amd64/cni-plugins-linux-amd64-v0.9.1.tgz to 192.168.142.139:/tmp/kubekey/cni-plugins-linux-amd64-v0.9.1.tgz Done
Push /root/kubekey/v1.21.5/amd64/kubelet to 192.168.142.138:/tmp/kubekey/kubelet Done
Push /root/kubekey/v1.21.5/amd64/kubectl to 192.168.142.138:/tmp/kubekey/kubectl Done
Push /root/kubekey/v1.21.5/amd64/helm to 192.168.142.138:/tmp/kubekey/helm Done
Push /root/kubekey/v1.21.5/amd64/cni-plugins-linux-amd64-v0.9.1.tgz to 192.168.142.138:/tmp/kubekey/cni-plugins-linux-amd64-v0.9.1.tgz Done
INFO[21:06:56 CST] Initializing kubernetes cluster
[ks-master1 192.168.142.91] MSG:
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0319 21:12:05.092844 27393 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: context deadline exceeded
[preflight] Running pre-flight checks
W0319 21:12:05.093314 27393 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
[ks-master1 192.168.142.91] MSG:
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0319 21:17:15.982564 30405 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: context deadline exceeded
[preflight] Running pre-flight checks
W0319 21:17:15.983070 30405 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
ERRO[21:21:57 CST] Failed to init kubernetes cluster: Failed to exec command: sudo env PATH=$PATH:/sbin:/usr/sbin /bin/sh -c "/usr/local/bin/kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=FileExisting-crictl"
W0319 21:17:28.113654 30844 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
[init] Using Kubernetes version: v1.21.5
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [ks-master1 ks-master1.ks.xunsiya.com ks-master2 ks-master2.ks.xunsiya.com ks-master3 ks-master3.ks.xunsiya.com ks-node1 ks-node1.ks.xunsiya.com ks-node2 ks-node2.ks.xunsiya.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.ks.xunsiya.com lb.xunsiya.com localhost] and IPs [10.233.0.1 192.168.142.91 127.0.0.1 192.168.142.210 192.168.142.93 192.168.142.96 192.168.142.138 192.168.142.139]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] External etcd mode: Skipping etcd/ca certificate authority generation
[certs] External etcd mode: Skipping etcd/server certificate generation
[certs] External etcd mode: Skipping etcd/peer certificate generation
[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation
[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher: Process exited with status 1 node=192.168.142.91
WARN[21:21:57 CST] Task failed ...
WARN[21:21:57 CST] error: interrupted by error
Error: Failed to init kubernetes cluster: interrupted by error
Usage:
kk create cluster [flags]
Flags:
--container-manager string Container runtime: docker, crio, containerd and isula. (default "docker")
--download-cmd string The user defined command to download the necessary binary files. The first param '%s' is output path, the second param '%s', is the URL (default "curl -L -o %s %s")
-f, --filename string Path to a configuration file
-h, --help help for cluster
--skip-pull-images Skip pre pull images
--with-kubernetes string Specify a supported version of kubernetes (default "v1.21.5")
--with-kubesphere Deploy a specific version of kubesphere (default v3.2.0)
--with-local-storage Deploy a local PV provisioner
-y, --yes Skip pre-check of the installation
Global Flags:
--debug Print detailed information (default true)
--in-cluster Running inside the cluster
Failed to init kubernetes cluster: interrupted by error
根据报错,感觉是节点之间通信出了问题,内网未启防火墙,网络没有阻隔。
查看负载均衡器报错日志,提示访问被拒。
error.log
2022/03/19 19:06:42 [notice] 16942#0: signal process started
2022/03/19 21:03:40 [error] 16944#0: *571465 connect() failed (111: Connection refused) while connecting to upstream, c
lient: 192.168.142.91, server: 0.0.0.0:8443, upstream: "192.168.142.91:6443", bytes from/to client:0/0, bytes from/to u
pstream:0/0
2022/03/19 21:03:40 [error] 16944#0: *571465 connect() failed (111: Connection refused) while connecting to upstream, c
lient: 192.168.142.91, server: 0.0.0.0:8443, upstream: "192.168.142.93:6443", bytes from/to client:0/0, bytes from/to u
pstream:0/0
2022/03/19 21:03:40 [error] 16944#0: *571465 connect() failed (111: Connection refused) while connecting to upstream, c
lient: 192.168.142.91, server: 0.0.0.0:8443, upstream: "192.168.142.96:6443", bytes from/to client:0/0, bytes from/to u
pstream:0/0
2022/03/19 21:03:41 [error] 16944#0: *571469 no live upstreams while connecting to upstream, client: 192.168.142.91, se
rver: 0.0.0.0:8443, upstream: "ks-k8s-apiserver", bytes from/to client:0/0, bytes from/to upstream:0/0
k8s-access.log
访问状态502
192.168.142.91 192.168.142.91:6443, 192.168.142.93:6443, 192.168.142.96:6443 - [19/Mar/2022:21:03:40 +0800] 502 0, 0, 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:41 +0800] 502 0
192.168.142.91 192.168.142.91:6443, 192.168.142.93:6443, 192.168.142.96:6443 - [19/Mar/2022:21:03:42 +0800] 502 0, 0, 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:42 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:42 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:42 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:43 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:43 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:43 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:44 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:44 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:44 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:44 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:45 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:45 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:45 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:46 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:46 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:46 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:47 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:47 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:47 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:48 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:48 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:48 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:48 +0800] 502 0
192.168.142.91 ks-k8s-apiserver - [19/Mar/2022:21:03:48 +0800] 502 0
现在基本怀疑是新创建的k8s集群各节点通过负载VIP相互访问出问题,但是负载配置中的其他两套k8s集群访问都很正常(分别通过6443、7443端口),配置都一样,只是负载侦听端口不同。
为此陷入僵局,请教老铁们帮我分析分析问题,感激不尽!!!