v4.1.2 離線安裝報錯 etcd health check failed

Ccici · 8 2月

h00283@coverity-ms:~/kubesphere$ sudo journalctl -f -u etcd
[sudo] password for h00283: 
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"info","ts":"2025-02-08T16:55:01.66247+0800","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://127.0.0.1:2379","https://172.1.30.21:2379"]}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"info","ts":"2025-02-08T16:55:01.662596+0800","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.13","git-sha":"c9063a0dc","go-version":"go1.21.8","go-os":"linux","go-arch":"amd64","max-cpu-set":16,"max-cpu-available":16,"member-initialized":false,"name":"etcd-coverity-ms","data-dir":"/var/lib/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/etcd/member","force-new-cluster":false,"heartbeat-interval":"250ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://172.1.30.21:2380"],"listen-peer-urls":["https://172.1.30.21:2380"],"advertise-client-urls":["https://172.1.30.21:2379"],"listen-client-urls":["https://127.0.0.1:2379","https://172.1.30.21:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"etcd-coverity-ms=https://172.1.30.21:2380","initial-cluster-state":"existing","initial-cluster-token":"k8s_etcd","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"8h0m0s","auto-compaction-interval":"8h0m0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"warn","ts":"2025-02-08T16:55:01.662674+0800","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/etcd\" exist, but the permission is \"drwxr-xr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"info","ts":"2025-02-08T16:55:01.663982+0800","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/etcd/member/snap/db","took":"1.178268ms"}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"info","ts":"2025-02-08T16:55:01.665027+0800","caller":"embed/etcd.go:375","msg":"closing etcd server","name":"etcd-coverity-ms","data-dir":"/var/lib/etcd","advertise-peer-urls":["https://172.1.30.21:2380"],"advertise-client-urls":["https://172.1.30.21:2379"]}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"info","ts":"2025-02-08T16:55:01.665092+0800","caller":"embed/etcd.go:377","msg":"closed etcd server","name":"etcd-coverity-ms","data-dir":"/var/lib/etcd","advertise-peer-urls":["https://172.1.30.21:2380"],"advertise-client-urls":["https://172.1.30.21:2379"]}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"fatal","ts":"2025-02-08T16:55:01.66511+0800","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"cannot fetch cluster info from peer urls: could not retrieve cluster information from the given URLs","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:267"}
Feb 08 16:55:01 coverity-ms systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Feb 08 16:55:01 coverity-ms systemd[1]: etcd.service: Failed with result 'exit-code'.
Feb 08 16:55:01 coverity-ms systemd[1]: Failed to start etcd.

Ccici · 8 2月

sylvia 我編輯在上方

Ccici · 8 2月

h00283@coverity-ms:~/kubesphere$ sudo journalctl -f -u etcd
[sudo] password for h00283: 
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"info","ts":"2025-02-08T16:55:01.66247+0800","caller":"embed/etcd.go:135","msg":"configuring client listeners","listen-client-urls":["https://127.0.0.1:2379","https://172.1.30.21:2379"]}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"info","ts":"2025-02-08T16:55:01.662596+0800","caller":"embed/etcd.go:308","msg":"starting an etcd server","etcd-version":"3.5.13","git-sha":"c9063a0dc","go-version":"go1.21.8","go-os":"linux","go-arch":"amd64","max-cpu-set":16,"max-cpu-available":16,"member-initialized":false,"name":"etcd-coverity-ms","data-dir":"/var/lib/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/lib/etcd/member","force-new-cluster":false,"heartbeat-interval":"250ms","election-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["https://172.1.30.21:2380"],"listen-peer-urls":["https://172.1.30.21:2380"],"advertise-client-urls":["https://172.1.30.21:2379"],"listen-client-urls":["https://127.0.0.1:2379","https://172.1.30.21:2379"],"listen-metrics-urls":[],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"etcd-coverity-ms=https://172.1.30.21:2380","initial-cluster-state":"existing","initial-cluster-token":"k8s_etcd","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-enabled":false,"compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"8h0m0s","auto-compaction-interval":"8h0m0s","discovery-url":"","discovery-proxy":"","downgrade-check-interval":"5s"}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"warn","ts":"2025-02-08T16:55:01.662674+0800","caller":"fileutil/fileutil.go:53","msg":"check file permission","error":"directory \"/var/lib/etcd\" exist, but the permission is \"drwxr-xr-x\". The recommended permission is \"-rwx------\" to prevent possible unprivileged access to the data"}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"info","ts":"2025-02-08T16:55:01.663982+0800","caller":"etcdserver/backend.go:81","msg":"opened backend db","path":"/var/lib/etcd/member/snap/db","took":"1.178268ms"}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"info","ts":"2025-02-08T16:55:01.665027+0800","caller":"embed/etcd.go:375","msg":"closing etcd server","name":"etcd-coverity-ms","data-dir":"/var/lib/etcd","advertise-peer-urls":["https://172.1.30.21:2380"],"advertise-client-urls":["https://172.1.30.21:2379"]}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"info","ts":"2025-02-08T16:55:01.665092+0800","caller":"embed/etcd.go:377","msg":"closed etcd server","name":"etcd-coverity-ms","data-dir":"/var/lib/etcd","advertise-peer-urls":["https://172.1.30.21:2380"],"advertise-client-urls":["https://172.1.30.21:2379"]}
Feb 08 16:55:01 coverity-ms etcd[34961]: {"level":"fatal","ts":"2025-02-08T16:55:01.66511+0800","caller":"etcdmain/etcd.go:204","msg":"discovery failed","error":"cannot fetch cluster info from peer urls: could not retrieve cluster information from the given URLs","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:204\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:267"}
Feb 08 16:55:01 coverity-ms systemd[1]: etcd.service: Main process exited, code=exited, status=1/FAILURE
Feb 08 16:55:01 coverity-ms systemd[1]: etcd.service: Failed with result 'exit-code'.
Feb 08 16:55:01 coverity-ms systemd[1]: Failed to start etcd.

Ccici · 10 2月

有人可以協助幫忙嗎

Cauchy · 10 2月

可以先尝试卸载一下，然后再重装试试看，还是不行的话再看看 etcd 日志
卸载可以用 ./kk delete cluster -f xxx.yaml

Ccici · 10 2月

Cauchy 卸載過了一樣沒辦法

現在看起來etcd沒辦法裝

Ccici · 10 2月

不確定什麼原因突然就沒這個錯誤了

Ccici · 10 2月

請問現在一直卡在init cluster using kubeadm 該怎麼做?

13:51:02 CST [InitKubernetesModule] Init cluster using kubeadm
14:13:32 CST stdout: [coverity-ms]
W0210 13:51:02.303972    9282 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
[init] Using Kubernetes version: v1.28.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0210 14:02:01.427098    9282 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "kubesphere/pause:3.9" as the CRI sandbox image.
        [WARNING ImagePull]: failed to pull image kubesphere/kube-apiserver:v1.28.0: output: E0210 13:54:01.691356    9407 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = DeadlineExceeded desc = failed to pull and unpack image \"docker.io/kubesphere/kube-apiserver:v1.28.0\": failed to resolve reference \"docker.io/kubesphere/kube-apiserver:v1.28.0\": failed to authorize: failed to fetch anonymous token: Get \"https://auth.docker.io/token?scope=repository%!A(MISSING)kubesphere%!F(MISSING)kube-apiserver%!A(MISSING)pull&service=registry.docker.io\": dial tcp 3.94.224.37:443: i/o timeout" image="kubesphere/kube-apiserver:v1.28.0"
time="2025-02-10T13:54:01+08:00" level=fatal msg="pulling image: rpc error: code = DeadlineExceeded desc = failed to pull and unpack image \"docker.io/kubesphere/kube-apiserver:v1.28.0\": failed to resolve reference \"docker.io/kubesphere/kube-apiserver:v1.28.0\": failed to authorize: failed to fetch anonymous token: Get \"https://auth.docker.io/token?scope=repository%!A(MISSING)kubesphere%!F(MISSING)kube-apiserver%!A(MISSING)pull&service=registry.docker.io\": dial tcp 3.94.224.37:443: i/o timeout"
, error: exit status 1
        [WARNING ImagePull]: failed to pull image kubesphere/kube-controller-manager:v1.28.0: output: E0210 13:56:41.015272    9487 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/kubesphere/kube-controller-manager:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com/registry-v2/docker/registry/v2/blobs/sha256/4b/4be79c38a4bab6e1252a35697500e8a0d9c5c7c771d9fcc1935c9a7f6cdf4c62/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=f1baa2dd9b876aeb89efebbfc9e5d5f4%2F20250210%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250210T055630Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=da96e5ce4ceb76b34be79fd7b2a2803430266da58aa7709d86fd3dac8e7def9d\": net/http: TLS handshake timeout" image="kubesphere/kube-controller-manager:v1.28.0"
time="2025-02-10T13:56:41+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"docker.io/kubesphere/kube-controller-manager:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com/registry-v2/docker/registry/v2/blobs/sha256/4b/4be79c38a4bab6e1252a35697500e8a0d9c5c7c771d9fcc1935c9a7f6cdf4c62/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=f1baa2dd9b876aeb89efebbfc9e5d5f4%2F20250210%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250210T055630Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=da96e5ce4ceb76b34be79fd7b2a2803430266da58aa7709d86fd3dac8e7def9d\": net/http: TLS handshake timeout"
, error: exit status 1
        [WARNING ImagePull]: failed to pull image kubesphere/kube-scheduler:v1.28.0: output: E0210 13:59:11.180038    9566 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/kubesphere/kube-scheduler:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com/registry-v2/docker/registry/v2/blobs/sha256/f6/f6f496300a2ae7a6727ccf3080d66d2fd22b6cfc271df5351c976c23a28bb157/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=f1baa2dd9b876aeb89efebbfc9e5d5f4%2F20250210%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250210T055901Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=b056026a752e2085934e266a8b0522a05983a7497a1504da32f2ff393515f164\": net/http: TLS handshake timeout" image="kubesphere/kube-scheduler:v1.28.0"
time="2025-02-10T13:59:11+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"docker.io/kubesphere/kube-scheduler:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com/registry-v2/docker/registry/v2/blobs/sha256/f6/f6f496300a2ae7a6727ccf3080d66d2fd22b6cfc271df5351c976c23a28bb157/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=f1baa2dd9b876aeb89efebbfc9e5d5f4%2F20250210%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20250210T055901Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=b056026a752e2085934e266a8b0522a05983a7497a1504da32f2ff393515f164\": net/http: TLS handshake timeout"
, error: exit status 1
        [WARNING ImagePull]: failed to pull image kubesphere/kube-proxy:v1.28.0: output: E0210 14:02:01.404848    9644 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/kubesphere/kube-proxy:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/ea/ea1030da44aa18666a7bf15fddd2a38c3143c3277159cb8bdd95f45c8ce62d7a/data?expires=1739170311&signature=xP7oC3XNxUDmlYXy71tSLQBqlJc%3D&version=2\": net/http: TLS handshake timeout" image="kubesphere/kube-proxy:v1.28.0"
time="2025-02-10T14:02:01+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"docker.io/kubesphere/kube-proxy:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/ea/ea1030da44aa18666a7bf15fddd2a38c3143c3277159cb8bdd95f45c8ce62d7a/data?expires=1739170311&signature=xP7oC3XNxUDmlYXy71tSLQBqlJc%3D&version=2\": net/http: TLS handshake timeout"
, error: exit status 1
        [WARNING ImagePull]: failed to pull image kubesphere/pause:3.9: output: E0210 14:05:41.084287    9737 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/kubesphere/pause:3.9\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/e6/e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c/data?expires=1739170530&signature=AkrKij8WFu9ksi4JaGQGAj%2BxIoI%3D&version=2\": net/http: TLS handshake timeout" image="kubesphere/pause:3.9"
time="2025-02-10T14:05:41+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"docker.io/kubesphere/pause:3.9\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/e6/e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c/data?expires=1739170530&signature=AkrKij8WFu9ksi4JaGQGAj%2BxIoI%3D&version=2\": net/http: TLS handshake timeout"
, error: exit status 1
        [WARNING ImagePull]: failed to pull image coredns/coredns:1.9.3: output: E0210 14:09:29.816684    9809 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/coredns/coredns:1.9.3\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/51/5185b96f0becf59032b8e3646e99f84d9655dff3ac9e2605e0dc77f9c441ae4a/data?expires=1739170753&signature=SPv%2Fsv93eXwYzLF7U0LXu%2BsEAAY%3D&version=2\": net/http: TLS handshake timeout" image="coredns/coredns:1.9.3"
time="2025-02-10T14:09:29+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"docker.io/coredns/coredns:1.9.3\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/51/5185b96f0becf59032b8e3646e99f84d9655dff3ac9e2605e0dc77f9c441ae4a/data?expires=1739170753&signature=SPv%2Fsv93eXwYzLF7U0LXu%2BsEAAY%3D&version=2\": net/http: TLS handshake timeout"
, error: exit status 1
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [coverity-ms coverity-ms.cluster.local kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local lb.kubesphere.local localhost] and IPs [10.233.0.1 172.1.30.21 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] External etcd mode: Skipping etcd/ca certificate authority generation
[certs] External etcd mode: Skipping etcd/server certificate generation
[certs] External etcd mode: Skipping etcd/peer certificate generation
[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation
[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
14:13:33 CST stdout: [coverity-ms]
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0210 14:13:33.119954    9920 reset.go:120] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: Get "https://lb.kubesphere.local:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 172.1.30.21:6443: connect: connection refused
[preflight] Running pre-flight checks
W0210 14:13:33.120038    9920 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Deleted contents of the etcd data directory: /var/lib/etcd
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
14:13:33 CST message: [coverity-ms]
init kubernetes cluster failed: Failed to exec command: sudo -E /bin/bash -c "/usr/local/bin/kubeadm init --config=/etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=FileExisting-crictl,ImagePull" 
W0210 13:51:02.303972    9282 utils.go:69] The recommended value for "clusterDNS" in "KubeletConfiguration" is: [10.233.0.10]; the provided value is: [169.254.25.10]
[init] Using Kubernetes version: v1.28.0
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0210 14:02:01.427098    9282 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "kubesphere/pause:3.9" as the CRI sandbox image.
        [WARNING ImagePull]: failed to pull image kubesphere/kube-apiserver:v1.28.0: output: E0210 13:54:01.691356    9407 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = DeadlineExceeded desc = failed to pull and unpack image \"docker.io/kubesphere/kube-apiserver:v1.28.0\": failed to resolve reference \"docker.io/kubesphere/kube-apiserver:v1.28.0\": failed to authorize: failed to fetch anonymous token: Get \"https://auth.docker.io/token?scope=repository%!!(MISSING)A(MISSING)kubesphere%!!(MISSING)F(MISSING)kube-apiserver%!!(MISSING)A(MISSING)pull&service=registry.docker.io\": dial tcp 3.94.224.37:443: i/o timeout" image="kubesphere/kube-apiserver:v1.28.0"
time="2025-02-10T13:54:01+08:00" level=fatal msg="pulling image: rpc error: code = DeadlineExceeded desc = failed to pull and unpack image \"docker.io/kubesphere/kube-apiserver:v1.28.0\": failed to resolve reference \"docker.io/kubesphere/kube-apiserver:v1.28.0\": failed to authorize: failed to fetch anonymous token: Get \"https://auth.docker.io/token?scope=repository%!!(MISSING)A(MISSING)kubesphere%!!(MISSING)F(MISSING)kube-apiserver%!!(MISSING)A(MISSING)pull&service=registry.docker.io\": dial tcp 3.94.224.37:443: i/o timeout"
, error: exit status 1
        [WARNING ImagePull]: failed to pull image kubesphere/kube-controller-manager:v1.28.0: output: E0210 13:56:41.015272    9487 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/kubesphere/kube-controller-manager:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com/registry-v2/docker/registry/v2/blobs/sha256/4b/4be79c38a4bab6e1252a35697500e8a0d9c5c7c771d9fcc1935c9a7f6cdf4c62/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=f1baa2dd9b876aeb89efebbfc9e5d5f4%!F(MISSING)20250210%!F(MISSING)auto%!F(MISSING)s3%!F(MISSING)aws4_request&X-Amz-Date=20250210T055630Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=da96e5ce4ceb76b34be79fd7b2a2803430266da58aa7709d86fd3dac8e7def9d\": net/http: TLS handshake timeout" image="kubesphere/kube-controller-manager:v1.28.0"
time="2025-02-10T13:56:41+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"docker.io/kubesphere/kube-controller-manager:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com/registry-v2/docker/registry/v2/blobs/sha256/4b/4be79c38a4bab6e1252a35697500e8a0d9c5c7c771d9fcc1935c9a7f6cdf4c62/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=f1baa2dd9b876aeb89efebbfc9e5d5f4%!F(MISSING)20250210%!F(MISSING)auto%!F(MISSING)s3%!F(MISSING)aws4_request&X-Amz-Date=20250210T055630Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=da96e5ce4ceb76b34be79fd7b2a2803430266da58aa7709d86fd3dac8e7def9d\": net/http: TLS handshake timeout"
, error: exit status 1
        [WARNING ImagePull]: failed to pull image kubesphere/kube-scheduler:v1.28.0: output: E0210 13:59:11.180038    9566 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/kubesphere/kube-scheduler:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com/registry-v2/docker/registry/v2/blobs/sha256/f6/f6f496300a2ae7a6727ccf3080d66d2fd22b6cfc271df5351c976c23a28bb157/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=f1baa2dd9b876aeb89efebbfc9e5d5f4%!F(MISSING)20250210%!F(MISSING)auto%!F(MISSING)s3%!F(MISSING)aws4_request&X-Amz-Date=20250210T055901Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=b056026a752e2085934e266a8b0522a05983a7497a1504da32f2ff393515f164\": net/http: TLS handshake timeout" image="kubesphere/kube-scheduler:v1.28.0"
time="2025-02-10T13:59:11+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"docker.io/kubesphere/kube-scheduler:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://docker-images-prod.6aa30f8b08e16409b46e0173d6de2f56.r2.cloudflarestorage.com/registry-v2/docker/registry/v2/blobs/sha256/f6/f6f496300a2ae7a6727ccf3080d66d2fd22b6cfc271df5351c976c23a28bb157/data?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=f1baa2dd9b876aeb89efebbfc9e5d5f4%!F(MISSING)20250210%!F(MISSING)auto%!F(MISSING)s3%!F(MISSING)aws4_request&X-Amz-Date=20250210T055901Z&X-Amz-Expires=1200&X-Amz-SignedHeaders=host&X-Amz-Signature=b056026a752e2085934e266a8b0522a05983a7497a1504da32f2ff393515f164\": net/http: TLS handshake timeout"
, error: exit status 1
        [WARNING ImagePull]: failed to pull image kubesphere/kube-proxy:v1.28.0: output: E0210 14:02:01.404848    9644 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/kubesphere/kube-proxy:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/ea/ea1030da44aa18666a7bf15fddd2a38c3143c3277159cb8bdd95f45c8ce62d7a/data?expires=1739170311&signature=xP7oC3XNxUDmlYXy71tSLQBqlJc%!D(MISSING)&version=2\": net/http: TLS handshake timeout" image="kubesphere/kube-proxy:v1.28.0"
time="2025-02-10T14:02:01+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"docker.io/kubesphere/kube-proxy:v1.28.0\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/ea/ea1030da44aa18666a7bf15fddd2a38c3143c3277159cb8bdd95f45c8ce62d7a/data?expires=1739170311&signature=xP7oC3XNxUDmlYXy71tSLQBqlJc%!D(MISSING)&version=2\": net/http: TLS handshake timeout"
, error: exit status 1
        [WARNING ImagePull]: failed to pull image kubesphere/pause:3.9: output: E0210 14:05:41.084287    9737 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/kubesphere/pause:3.9\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/e6/e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c/data?expires=1739170530&signature=AkrKij8WFu9ksi4JaGQGAj%!B(MISSING)xIoI%!D(MISSING)&version=2\": net/http: TLS handshake timeout" image="kubesphere/pause:3.9"
time="2025-02-10T14:05:41+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"docker.io/kubesphere/pause:3.9\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/e6/e6f1816883972d4be47bd48879a08919b96afcd344132622e4d444987919323c/data?expires=1739170530&signature=AkrKij8WFu9ksi4JaGQGAj%!B(MISSING)xIoI%!D(MISSING)&version=2\": net/http: TLS handshake timeout"
, error: exit status 1
        [WARNING ImagePull]: failed to pull image coredns/coredns:1.9.3: output: E0210 14:09:29.816684    9809 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = Unknown desc = failed to pull and unpack image \"docker.io/coredns/coredns:1.9.3\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/51/5185b96f0becf59032b8e3646e99f84d9655dff3ac9e2605e0dc77f9c441ae4a/data?expires=1739170753&signature=SPv%!F(MISSING)sv93eXwYzLF7U0LXu%!B(MISSING)sEAAY%!D(MISSING)&version=2\": net/http: TLS handshake timeout" image="coredns/coredns:1.9.3"
time="2025-02-10T14:09:29+08:00" level=fatal msg="pulling image: failed to pull and unpack image \"docker.io/coredns/coredns:1.9.3\": failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/51/5185b96f0becf59032b8e3646e99f84d9655dff3ac9e2605e0dc77f9c441ae4a/data?expires=1739170753&signature=SPv%!F(MISSING)sv93eXwYzLF7U0LXu%!B(MISSING)sEAAY%!D(MISSING)&version=2\": net/http: TLS handshake timeout"
, error: exit status 1
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [coverity-ms coverity-ms.cluster.local kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local lb.kubesphere.local localhost] and IPs [10.233.0.1 172.1.30.21 127.0.0.1]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] External etcd mode: Skipping etcd/ca certificate authority generation
[certs] External etcd mode: Skipping etcd/server certificate generation
[certs] External etcd mode: Skipping etcd/peer certificate generation
[certs] External etcd mode: Skipping etcd/healthcheck-client certificate generation
[certs] External etcd mode: Skipping apiserver-etcd-client certificate generation
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher: Process exited with status 1

Cauchy · 10 2月

cici

检查 docker ps 有没有 kube-apiserver 被创建，如果创建出来了，curl -k https://<master01 ip>:6443；如果没创建，检查kubelet日志
如果使用了负载均衡，curl -k https://<vip>:6443 验证负载均衡是不是通的

Ccici · 10 2月

Cauchy
kubelet沒有起來，現在etcd正常執行

h00283@coverity-ms:~$ sudo systemctl status kubelet
Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.
○ kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
     Active: inactive (dead)
       Docs: http://kubernetes.io/docs/

Feb 10 14:13:29 coverity-ms kubelet[9858]: E0210 14:13:29.319649    9858 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://lb.kubesp>
Feb 10 14:13:31 coverity-ms kubelet[9858]: E0210 14:13:31.987813    9858 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, Obje>
Feb 10 14:13:32 coverity-ms kubelet[9858]: E0210 14:13:32.614103    9858 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node i>
Feb 10 14:13:33 coverity-ms kubelet[9858]: W0210 14:13:33.068103    9858 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Ge>
Feb 10 14:13:33 coverity-ms kubelet[9858]: E0210 14:13:33.068169    9858 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: f>
Feb 10 14:13:33 coverity-ms systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
Feb 10 14:13:33 coverity-ms kubelet[9858]: I0210 14:13:33.123346    9858 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/kubernetes/>
Feb 10 14:13:33 coverity-ms systemd[1]: kubelet.service: Deactivated successfully.
Feb 10 14:13:33 coverity-ms systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 10 14:13:33 coverity-ms systemd[1]: kubelet.service: Consumed 1.825s CPU time.

Ccici · 10 2月

h00283@coverity-ms:~/ks$ sudo journalctl -u kubelet -f
[sudo] password for h00283: 
Feb 10 15:07:16 coverity-ms kubelet[16512]: E0210 15:07:16.938712   16512 run.go:74] "command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory"
Feb 10 15:07:16 coverity-ms systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Feb 10 15:07:16 coverity-ms systemd[1]: kubelet.service: Failed with result 'exit-code'.
Feb 10 15:07:27 coverity-ms systemd[1]: kubelet.service: Scheduled restart job, restart counter is at 90.
Feb 10 15:07:27 coverity-ms systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 10 15:07:27 coverity-ms systemd[1]: Started kubelet: The Kubernetes Node Agent.
Feb 10 15:07:27 coverity-ms kubelet[16541]: E0210 15:07:27.189671   16541 run.go:74] "command failed" err="failed to load kubelet config file, path: /var/lib/kubelet/config.yaml, error: failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file \"/var/lib/kubelet/config.yaml\", error: open /var/lib/kubelet/config.yaml: no such file or directory"
Feb 10 15:07:27 coverity-ms systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Feb 10 15:07:27 coverity-ms systemd[1]: kubelet.service: Failed with result 'exit-code'.
Feb 10 15:07:33 coverity-ms systemd[1]: Stopped kubelet: The Kubernetes Node Agent.

h00283@coverity-ms:~/ks$ kubeadm config images list
W0210 16:27:02.652703   20003 version.go:104] could not fetch a Kubernetes version from the internet: unable to get URL "https://dl.k8s.io/release/stable-1.txt": Get "https://dl.k8s.io/release/stable-1.txt": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
W0210 16:27:02.652766   20003 version.go:105] falling back to the local client version: v1.28.0
registry.k8s.io/kube-apiserver:v1.28.0
registry.k8s.io/kube-controller-manager:v1.28.0
registry.k8s.io/kube-scheduler:v1.28.0
registry.k8s.io/kube-proxy:v1.28.0

Cauchy · 10 2月

cici
可以在 init 的过程中查看 kubelet日志，或者完全失败退出后查看。

另外容器运行时用的是不是containerd，是的话看下containerd的配置里 sandbox 设置是镜像是否可以正常 pull到

Ccici · 10 2月

Cauchy
我用預設的沒改其他東西

h00283@coverity-ms:~/ks$ sudo journalctl -u kubelet -f
Feb 10 16:33:37 coverity-ms kubelet[20129]: E0210 16:33:37.612226   20129 kubelet_node_status.go:92] "Unable to register node with API server" err="Post \"https://lb.kubesphere.local:6443/api/v1/nodes\": dial tcp 172.1.30.21:6443: connect: connection refused" node="coverity-ms"
Feb 10 16:33:37 coverity-ms kubelet[20129]: E0210 16:33:37.974136   20129 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = DeadlineExceeded desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp 34.96.108.209:443: i/o timeout"
Feb 10 16:33:37 coverity-ms kubelet[20129]: E0210 16:33:37.974191   20129 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = DeadlineExceeded desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp 34.96.108.209:443: i/o timeout" pod="kube-system/kube-scheduler-coverity-ms"
Feb 10 16:33:37 coverity-ms kubelet[20129]: E0210 16:33:37.974223   20129 kuberuntime_manager.go:1119] "CreatePodSandbox for pod failed" err="rpc error: code = DeadlineExceeded desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": failed to pull image \"registry.k8s.io/pause:3.8\": failed to pull and unpack image \"registry.k8s.io/pause:3.8\": failed to resolve reference \"registry.k8s.io/pause:3.8\": failed to do request: Head \"https://registry.k8s.io/v2/pause/manifests/3.8\": dial tcp 34.96.108.209:443: i/o timeout" pod="kube-system/kube-scheduler-coverity-ms"
Feb 10 16:33:37 coverity-ms kubelet[20129]: E0210 16:33:37.974304   20129 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-coverity-ms_kube-system(b68b9e35fcab51848c5f2ecaf37ba14d)\" with CreatePodSandboxError: \"Failed to create sandbox for pod \\\"kube-scheduler-coverity-ms_kube-system(b68b9e35fcab51848c5f2ecaf37ba14d)\\\": rpc error: code = DeadlineExceeded desc = failed to get sandbox image \\\"registry.k8s.io/pause:3.8\\\": failed to pull image \\\"registry.k8s.io/pause:3.8\\\": failed to pull and unpack image \\\"registry.k8s.io/pause:3.8\\\": failed to resolve reference \\\"registry.k8s.io/pause:3.8\\\": failed to do request: Head \\\"https://registry.k8s.io/v2/pause/manifests/3.8\\\": dial tcp 34.96.108.209:443: i/o timeout\"" pod="kube-system/kube-scheduler-coverity-ms" podUID="b68b9e35fcab51848c5f2ecaf37ba14d"
Feb 10 16:33:38 coverity-ms kubelet[20129]: I0210 16:33:38.768345   20129 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Feb 10 16:33:38 coverity-ms systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
Feb 10 16:33:38 coverity-ms systemd[1]: kubelet.service: Deactivated successfully.
Feb 10 16:33:38 coverity-ms systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 10 16:33:38 coverity-ms systemd[1]: kubelet.service: Consumed 1.265s CPU time.

init時查的log

Cauchy · 10 2月

cici
failed to pull image \“registry.k8s.io/pause:3.8\”
从别的节点上 copy 一份 /etc/containerd/config.toml 到这个节点上，然后 systemctl restart containerd，然后卸载重装一下

Ccici · 10 2月

Cauchy

照著做之後

h00283@coverity-ms:~/ks$ sudo journalctl -u kubelet -f
[sudo] password for h00283: 
Feb 10 17:30:49 coverity-ms kubelet[23146]: W0210 17:30:49.347662   23146 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get "https://lb.kubesphere.local:6443/api/v1/nodes?fieldSelector=metadata.name%3Dcoverity-ms&limit=500&resourceVersion=0": dial tcp 172.1.30.21:6443: connect: connection refused
Feb 10 17:30:49 coverity-ms kubelet[23146]: E0210 17:30:49.347716   23146 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://lb.kubesphere.local:6443/api/v1/nodes?fieldSelector=metadata.name%3Dcoverity-ms&limit=500&resourceVersion=0": dial tcp 172.1.30.21:6443: connect: connection refused
Feb 10 17:30:49 coverity-ms kubelet[23146]: W0210 17:30:49.512521   23146 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: Get "https://lb.kubesphere.local:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 172.1.30.21:6443: connect: connection refused
Feb 10 17:30:49 coverity-ms kubelet[23146]: E0210 17:30:49.512577   23146 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://lb.kubesphere.local:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 172.1.30.21:6443: connect: connection refused
Feb 10 17:30:52 coverity-ms kubelet[23146]: E0210 17:30:52.186152   23146 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"coverity-ms\" not found"
Feb 10 17:30:52 coverity-ms kubelet[23146]: I0210 17:30:52.327813   23146 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/kubernetes/pki/ca.crt"
Feb 10 17:30:52 coverity-ms systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
Feb 10 17:30:52 coverity-ms systemd[1]: kubelet.service: Deactivated successfully.
Feb 10 17:30:52 coverity-ms systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 10 17:30:52 coverity-ms systemd[1]: kubelet.service: Consumed 2.138s CPU time.

h00283@coverity-ms:~/ks$ sudo systemctl status kubelet
Warning: The unit file, source configuration file or drop-ins of kubelet.service changed on disk. Run 'systemctl daemon-reload' to reload units.
○ kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
     Active: inactive (dead)
       Docs: http://kubernetes.io/docs/

Feb 10 17:30:49 coverity-ms kubelet[23146]: W0210 17:30:49.347662   23146 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Node: Get >
Feb 10 17:30:49 coverity-ms kubelet[23146]: E0210 17:30:49.347716   23146 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Node: fai>
Feb 10 17:30:49 coverity-ms kubelet[23146]: W0210 17:30:49.512521   23146 reflector.go:535] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: G>
Feb 10 17:30:49 coverity-ms kubelet[23146]: E0210 17:30:49.512577   23146 reflector.go:147] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: >
Feb 10 17:30:52 coverity-ms kubelet[23146]: E0210 17:30:52.186152   23146 eviction_manager.go:258] "Eviction manager: failed to get summary stats" err="failed to get node >
Feb 10 17:30:52 coverity-ms kubelet[23146]: I0210 17:30:52.327813   23146 dynamic_cafile_content.go:171] "Shutting down controller" name="client-ca-bundle::/etc/kubernetes>
Feb 10 17:30:52 coverity-ms systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
Feb 10 17:30:52 coverity-ms systemd[1]: kubelet.service: Deactivated successfully.
Feb 10 17:30:52 coverity-ms systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 10 17:30:52 coverity-ms systemd[1]: kubelet.service: Consumed 2.138s CPU time.
lines 1-16/16 (END)

Cauchy · 10 2月

cici
crcitl ps 看看有没有 kube-apiserver，没有的话需要继续在kubelet日志里找为什么没有创建出来这个container的原因，先不要管那些 6443 connection refused 的日志，kube-apiserver起来之后就没那些日志了

Ccici · 11 2月

Cauchy 我發現docker image很多沒拉下來成功導致kubelet kubeadm containerd等等的沒在運作

Ccici · 13 2月

已經成功了!

—

過程：
把相關安裝的工具都先移除，資料夾也是

發現

etcd 跟 containerd 本來就系統安裝過，導致版本有問題 => 移除安裝
kubelet、kubectl 未安裝完成，資料夾裡缺少相關配置 => 資料夾刪除乾淨
sudo ./kk delete cluster -f config-sample.yaml
移除 kubekey 資料夾
sudo ./kk create cluster -f config-sample.yaml -a kubesphere.tar.gz --with-local-storage
執行到 [init cluster using kubeadm] 的時候:
(1) 確認 etcd 狀態 sudo systemctl status etcd => 確定沒有錯誤訊息

(2) 確認 containerd 狀態 sudo systemctl status containerd

　　<遇到鏡像拉取失敗問題>
　　檢查有沒有這份文件 /etc/containerd/config.toml => 沒有的話從有的節點複製過去，重複上面 delete cluster 動作，還是有報錯的話繼續解決，直到沒有錯的時候應該就能順利成功 create cluster 了
create cluster 完成後就可以安裝 KubeSphere

Zzxcccom · 26 2月

cici 您好，我想问一下，怎么解决kubelet起不来的问题，你说的删除资料夹具体指哪些呢，麻烦大佬指点一下

Ccici · 26 2月

zxcccom
Hi,我是參考這篇文章https://blog.csdn.net/qq_40184595/article/details/129439402

照著把東西清一清

which kubelet 指令下下去資料夾也刪掉(我kubectl等等的也都有做)

然後kk delete cluster跟把kubekey資料夾移掉重新 create cluster

-

也要注意到 init kubeadm 時候可以去看一下你的etcd、containerd、kubelet、kubelet.service 有沒有報錯!

v4.1.2 離線安裝報錯 etcd health check failed

Ccici

Ccici

Ccici

Ccici

CauchyK零SK壹S

Ccici

Ccici

Ccici

CauchyK零SK壹S

Ccici

Ccici

CauchyK零SK壹S

Ccici

CauchyK零SK壹S

Ccici

CauchyK零SK壹S

Ccici

Ccici

Zzxcccom

Ccici