操作系统信息
物理机,Ubuntu20.04,4C/8G
操作步骤教程
Linux一键安装K8s集群 - 腾讯云开发者社区-腾讯云 (tencent.com)
集群分配情况
一个master节点,一个node节点,内网不互通。全都使用公网IP
错误描述
etcd服务启动失败(每次到这里会出现各种各样的问题,我先举例这次的)
安装过程
root@master-1:~# ./kk create cluster -f config-sample.yaml
_ __ _ _ __
| | / / | | | | / /
| |/ / _ _| |__ ___| |/ / ___ _ _
| \| | | | '_ \ / _ \ \ / _ \ | | |
| |\ \ |_| | |_) | __/ |\ \ __/ |_| |
\_| \_/\__,_|_.__/ \___\_| \_/\___|\__, |
__/ |
|___/
16:39:13 CST [GreetingsModule] Greetings
16:39:14 CST message: [node-1]
Greetings, KubeKey!
16:39:14 CST message: [master-1]
Greetings, KubeKey!
16:39:14 CST success: [node-1]
16:39:14 CST success: [master-1]
16:39:14 CST [NodePreCheckModule] A pre-check on nodes
16:39:15 CST success: [master-1]
16:39:15 CST success: [node-1]
16:39:15 CST [ConfirmModule] Display confirmation form
+----------+------+------+---------+----------+-------+-------+---------+-----------+--------+--------+------------+------------+-------------+------------------+--------------+
| name | sudo | curl | openssl | ebtables | socat | ipset | ipvsadm | conntrack | chrony | docker | containerd | nfs client | ceph client | glusterfs client | time |
+----------+------+------+---------+----------+-------+-------+---------+-----------+--------+--------+------------+------------+-------------+------------------+--------------+
| master-1 | y | y | y | y | y | y | | y | y | | | | | | CST 16:39:14 |
| node-1 | y | y | y | y | y | y | | y | y | | | | | | CST 16:39:15 |
+----------+------+------+---------+----------+-------+-------+---------+-----------+--------+--------+------------+------------+-------------+------------------+--------------+
This is a simple check of your environment.
Before installation, ensure that your machines meet all requirements specified at
https://github.com/kubesphere/kubekey#requirements-and-recommendations
Continue this installation? [yes/no]: yes
16:39:17 CST success: [LocalHost]
16:39:17 CST [NodeBinariesModule] Download installation binaries
16:39:17 CST message: [localhost]
downloading amd64 kubeadm v1.22.10 ...
16:39:17 CST message: [localhost]
kubeadm is existed
16:39:17 CST message: [localhost]
downloading amd64 kubelet v1.22.10 ...
16:39:18 CST message: [localhost]
kubelet is existed
16:39:18 CST message: [localhost]
downloading amd64 kubectl v1.22.10 ...
16:39:18 CST message: [localhost]
kubectl is existed
16:39:18 CST message: [localhost]
downloading amd64 helm v3.6.3 ...
16:39:18 CST message: [localhost]
helm is existed
16:39:18 CST message: [localhost]
downloading amd64 kubecni v0.9.1 ...
16:39:19 CST message: [localhost]
kubecni is existed
16:39:19 CST message: [localhost]
downloading amd64 crictl v1.24.0 ...
16:39:19 CST message: [localhost]
crictl is existed
16:39:19 CST message: [localhost]
downloading amd64 etcd v3.4.13 ...
16:39:19 CST message: [localhost]
etcd is existed
16:39:19 CST message: [localhost]
downloading amd64 docker 20.10.8 ...
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 58.1M 100 58.1M 0 0 245k 0 0:04:02 0:04:02 --:--:-- 243k
16:43:22 CST success: [LocalHost]
16:43:22 CST [ConfigureOSModule] Prepare to init OS
16:43:24 CST success: [master-1]
16:43:24 CST success: [node-1]
16:43:24 CST [ConfigureOSModule] Generate init os script
16:43:24 CST success: [master-1]
16:43:24 CST success: [node-1]
16:43:24 CST [ConfigureOSModule] Exec init os script
16:43:25 CST stdout: [master-1]
net.ipv4.ip_forward = 1
vm.swappiness = 1
net.core.somaxconn = 1024
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_max_syn_backlog = 1024
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_local_reserved_ports = 30000-32767
vm.max_map_count = 262144
fs.inotify.max_user_instances = 524288
kernel.pid_max = 65535
16:43:26 CST stdout: [node-1]
net.ipv4.ip_forward = 1
vm.swappiness = 1
net.core.somaxconn = 1024
net.ipv4.tcp_max_tw_buckets = 5000
net.ipv4.tcp_max_syn_backlog = 1024
kernel.unknown_nmi_panic = 1
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_local_reserved_ports = 30000-32767
vm.max_map_count = 262144
fs.inotify.max_user_instances = 524288
kernel.pid_max = 65535
16:43:26 CST success: [master-1]
16:43:26 CST success: [node-1]
16:43:26 CST [ConfigureOSModule] configure the ntp server for each node
16:43:26 CST skipped: [node-1]
16:43:26 CST skipped: [master-1]
16:43:26 CST [KubernetesStatusModule] Get kubernetes cluster status
16:43:26 CST success: [master-1]
16:43:26 CST [InstallContainerModule] Sync docker binaries
16:58:15 CST success: [master-1]
16:58:15 CST success: [node-1]
16:58:15 CST [InstallContainerModule] Generate docker service
16:58:16 CST success: [master-1]
16:58:16 CST success: [node-1]
16:58:16 CST [InstallContainerModule] Generate docker config
16:58:17 CST success: [master-1]
16:58:17 CST success: [node-1]
16:58:17 CST [InstallContainerModule] Enable docker
16:58:19 CST success: [master-1]
16:58:19 CST success: [node-1]
16:58:19 CST [InstallContainerModule] Add auths to container runtime
16:58:19 CST skipped: [master-1]
16:58:19 CST skipped: [node-1]
16:58:19 CST [PullModule] Start to pull images on all nodes
16:58:19 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/pause:3.5
16:58:19 CST message: [node-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/pause:3.5
16:58:20 CST message: [node-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy:v1.22.10
16:58:20 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-apiserver:v1.22.10
16:58:36 CST message: [node-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/coredns:1.8.0
16:58:43 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controller-manager:v1.22.10
16:58:49 CST message: [node-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/k8s-dns-node-cache:1.15.12
16:59:01 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-scheduler:v1.22.10
16:59:06 CST message: [node-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controllers:v3.20.0
16:59:11 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-proxy:v1.22.10
16:59:23 CST message: [node-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/cni:v3.20.0
16:59:36 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/coredns:1.8.0
16:59:47 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/k8s-dns-node-cache:1.15.12
16:59:59 CST message: [node-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/node:v3.20.0
17:00:06 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/kube-controllers:v3.20.0
17:00:26 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/cni:v3.20.0
17:00:45 CST message: [node-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/pod2daemon-flexvol:v3.20.0
17:01:03 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/node:v3.20.0
17:01:50 CST message: [master-1]
downloading image: registry.cn-beijing.aliyuncs.com/kubesphereio/pod2daemon-flexvol:v3.20.0
17:01:54 CST success: [node-1]
17:01:54 CST success: [master-1]
17:01:54 CST [ETCDPreCheckModule] Get etcd status
17:01:54 CST success: [node-1]
17:01:54 CST [CertsModule] Fetch etcd certs
17:01:54 CST success: [node-1]
17:01:54 CST [CertsModule] Generate etcd Certs
[certs] Generating "ca" certificate and key
[certs] node-master-1 serving cert is signed for DNS names [etcd etcd.kube-system etcd.kube-system.svc etcd.kube-system.svc.cluster.local lb.kubesphere.local localhost master-1 node-1] and IPs [127.0.0.1 ::1 IP就不说了 IP就不说了]
[certs] admin-node-1 serving cert is signed for DNS names [etcd etcd.kube-system etcd.kube-system.svc etcd.kube-system.svc.cluster.local lb.kubesphere.local localhost master-1 node-1] and IPs [127.0.0.1 ::1 IP就不说了 IP就不说了]
[certs] member-node-1 serving cert is signed for DNS names [etcd etcd.kube-system etcd.kube-system.svc etcd.kube-system.svc.cluster.local lb.kubesphere.local localhost master-1 node-1] and IPs [127.0.0.1 ::1 IP就不说了 IP就不说了]
17:01:55 CST success: [LocalHost]
17:01:55 CST [CertsModule] Synchronize certs file
17:02:00 CST success: [node-1]
17:02:00 CST [CertsModule] Synchronize certs file to master
17:02:01 CST success: [master-1]
17:02:01 CST [InstallETCDBinaryModule] Install etcd using binary
17:04:12 CST success: [node-1]
17:04:12 CST [InstallETCDBinaryModule] Generate etcd service
17:04:12 CST success: [node-1]
17:04:12 CST [InstallETCDBinaryModule] Generate access address
17:04:12 CST success: [node-1]
17:04:12 CST [ETCDConfigureModule] Health check on exist etcd
17:04:12 CST skipped: [node-1]
17:04:12 CST [ETCDConfigureModule] Generate etcd.env config on new etcd
17:04:13 CST success: [node-1]
17:04:13 CST [ETCDConfigureModule] Refresh etcd.env config on all etcd
17:04:14 CST success: [node-1]
17:04:14 CST [ETCDConfigureModule] Restart etcd
17:04:14 CST stdout: [node-1]
Job for etcd.service failed because the control process exited with error code.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.
17:04:14 CST message: [node-1]
start etcd failed: Failed to exec command: sudo -E /bin/bash -c "systemctl daemon-reload && systemctl restart etcd && systemctl enable etcd"
Job for etcd.service failed because the control process exited with error code.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.: Process exited with status 1
17:04:14 CST retry: [node-1]
17:04:19 CST stdout: [node-1]
Job for etcd.service failed because the control process exited with error code.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.
17:04:19 CST message: [node-1]
start etcd failed: Failed to exec command: sudo -E /bin/bash -c "systemctl daemon-reload && systemctl restart etcd && systemctl enable etcd"
Job for etcd.service failed because the control process exited with error code.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.: Process exited with status 1
17:04:19 CST retry: [node-1]
17:04:25 CST stdout: [node-1]
Job for etcd.service failed because the control process exited with error code.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.
17:04:25 CST message: [node-1]
start etcd failed: Failed to exec command: sudo -E /bin/bash -c "systemctl daemon-reload && systemctl restart etcd && systemctl enable etcd"
Job for etcd.service failed because the control process exited with error code.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.: Process exited with status 1
17:04:25 CST failed: [node-1]
error: Pipeline[CreateClusterPipeline] execute failed: Module[ETCDConfigureModule] exec failed:
failed: [node-1] [RestartETCD] exec failed after 3 retires: start etcd failed: Failed to exec command: sudo -E /bin/bash -c "systemctl daemon-reload && systemctl restart etcd && systemctl enable etcd"
Job for etcd.service failed because the control process exited with error code.
See "systemctl status etcd.service" and "journalctl -xeu etcd.service" for details.: Process exited with status 1
etcd日志(node节点):
root@node-1:~# systemctl status etcd.service
● etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; disabled; vendor preset: enabled)
Active: activating (auto-restart) (Result: exit-code) since Mon 2023-01-16 17:21:19 CST; 9s ago
Process: 12845 ExecStart=/usr/local/bin/etcd (code=exited, status=1/FAILURE)
Main PID: 12845 (code=exited, status=1/FAILURE)
CPU: 12ms
Jan 16 17:21:19 node-1 systemd[1]: etcd.service: Failed with result 'exit-code'.
Jan 16 17:21:19 node-1 systemd[1]: Failed to start etcd.
root@node-1:~# journalctl -xeu etcd.service
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_AUTO_COMPACTION_RETENTION=8
Jan 16 17:21:40 node-1 etcd[12881]: [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
Jan 16 17:21:40 node-1 etcd[12881]: [WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-node-1.pem
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_CLIENT_CERT_AUTH=true
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_DATA_DIR=/var/lib/etcd
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_ELECTION_TIMEOUT=5000
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_ENABLE_V2=true
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_HEARTBEAT_INTERVAL=250
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=https://IP就不说了:2380
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_INITIAL_CLUSTER=etcd-node-1=https://IP就不说了:2380
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_INITIAL_CLUSTER_STATE=new
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-node-1-key.pem
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=https://IP就不说了:2379,https://127.0>
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_LISTEN_PEER_URLS=https://IP就不说了:2380
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_METRICS=basic
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_NAME=etcd-node-1
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-node-1.pem
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_PEER_CLIENT_CERT_AUTH=True
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-node-1-key.pem
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_PROXY=off
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_SNAPSHOT_COUNT=10000
Jan 16 17:21:40 node-1 etcd[12881]: recognized and used environment variable ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
Jan 16 17:21:40 node-1 etcd[12881]: etcd Version: 3.4.13
Jan 16 17:21:40 node-1 etcd[12881]: Git SHA: ae9734ed2
Jan 16 17:21:40 node-1 etcd[12881]: Go Version: go1.12.17
Jan 16 17:21:40 node-1 etcd[12881]: Go OS/Arch: linux/amd64
Jan 16 17:21:40 node-1 etcd[12881]: setting maximum number of CPUs to 4, total number of available CPUs is 4
Jan 16 17:21:40 node-1 etcd[12881]: peerTLS: cert = /etc/ssl/etcd/ssl/member-node-1.pem, key = /etc/ssl/etcd/ssl/member-node-1-key.pem, trus>
Jan 16 17:21:40 node-1 etcd[12881]: listen tcp IP就不说了:2380: bind: cannot assign requested address
Jan 16 17:21:40 node-1 systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ An ExecStart= process belonging to unit etcd.service has exited.
░░
░░ The process' exit code is 'exited' and its exit status is 1.
Jan 16 17:21:40 node-1 systemd[1]: etcd.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ The unit etcd.service has entered the 'failed' state with result 'exit-code'.
Jan 16 17:21:40 node-1 systemd[1]: Failed to start etcd.
░░ Subject: A start job for unit etcd.service has failed
░░ Defined-By: systemd
░░ Support: http://www.ubuntu.com/support
░░
░░ A start job for unit etcd.service has finished with a failure.
░░
░░ The job identifier is 11627 and the job result is failed.
lines 1941-1994/1994 (END)
etcd日志(master节点):
root@master-1:~# systemctl status etcd.service
Unit etcd.service could not be found.
root@master-1:~# journalctl -xeu etcd.service
~
~
~
~
-- Logs begin at Fri 2022-02-25 13:01:40 CST, end at Mon 2023-01-16 17:19:56 CST. --
-- No entries --
我太难了 ≡(▔﹏▔)≡