操作系统信息
虚拟机,openEuler,6.6.0-72.0.0.76.oe2403sp1.x86_64
Kubernetes版本信息
v1.30.0
容器运行时
containerd v1.7.13
KubeSphere版本信息
在线安装。无K8S集群
问题
用kk3.1.10卸载k8s v1.30.0的时候,执行kk delete -f sample.conf
然后手动清理master和node节点残留文件:
其中有一项是删除 rm -rf /var/run/calico 报错,没有权限操作,(root用户下执行的)
然后大模型查了一下,建议先执行 umount /var/run/calico 然后再删除/var/run/calico,
按照上面步骤操作了后,重启了操作系统,然后重新用kk3.1.10安装k8s v1.30.0,能顺利安装,但是安装完后,除了master节点处于ready状态,其他worker节点都处于not ready状态,查看各个node节点上的pod,都是处于init或者contianercreating状态,继续describe发现,pod起不来是cgroup驱动问题,按照大模型提示发现2个worker节点就没有/var/run/calico/cgroup 目录,于是重新新建这个目录,并mount上去,并重启worker节点,还是一样报错,而且重置集群,重新安装,也是一样的。
mkdir -p /var/run/calico/cgroup
mount -t cgroup2 none /var/run/calico/cgroup -o rw,nosuid,nodev,noexec,relatime
Events:
Type Reason Age From Message
—
Normal Scheduled 63s default-scheduler Successfully assigned kube-system/calico-node-l4m8t to sv-cz-alg-test-worker-01
Warning FailedCreatePodSandBox 62s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create containerd task: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: openat2 /sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod8637b80c_5b75_48ac_a967_1a4b644d4229.slice/cri-containerd-a1de6738fb2131182dc93a86f59ada12c60245627456266bffcae3be4e6d7602.scope/cpu.weight: no such file or directory: unknown
下面是删除集群的操作
kk delete cluster -f config-k8s-v1300-ipipmode.yml -y
下面是清理各个节点残留文件的操作,在所有节点都执行
# 停止kubelet/containerd
601 systemctl stop kubelet containerd
602 systemctl disable kubelet containerd
603 # 停止calico网络(如有)
604 pkill -9 calico || true
605 pkill -9 bird || true
606 rm -rf /etc/kubernetes /var/lib/kubelet /var/lib/etcd /var/run/kubernetes
607 # 彻底清理calico所有残留(包括之前无法删除的cgroup文件)
608 rm -rf /etc/cni /opt/cni /var/lib/calico
rm -rf /var/run/calico
609 mount | grep calico/cgroup
610 umount /var/run/calico/cgroup
611 ls -ld /var/run/calico /var/lib/calico
612 # 彻底清理calico所有残留(包括之前无法删除的cgroup文件)
613 rm -rf /etc/cni /opt/cni /var/lib/calico
614 rm -rf /var/lib/containerd
615 mv /etc/containerd/config.toml /etc/containerd/config.toml.bak
616 rm -rf /var/lib/containerd /etc/containerd/config.toml
617 # 清理iptables/ipvs规则(重置网络规则)
618 iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat
619 ipvsadm –clear || true
620 # 清理路由(避免网络冲突)
621 ip route flush proto bird || true
622 ip route
625 systemctl restart containerd
626 service kubelet restart
627 reboot
623 #mkdir -p /var/run/calico/cgroup
624 #mount -t cgroup2 none /var/run/calico/cgroup -o rw,nosuid,nodev,noexec,relatime
kubelet的启动参数指定cgroupDriver: systemd,contianerd:SystemdCgroup = true,
我们内核启动参数:
GRUB_CMDLINE_LINUX=“resume=/dev/mapper/openeuler-swap rd.lvm.lv=openeuler/root rd.lvm.lv=openeuler/swap kernel.cgroup_no_v1=all systemd.unified_cgroup_hierarchy=1 apparmor=0 crashkernel=512M”
后面 解决办法是,GRUB_CMDLINE_LINUX=“resume=/dev/mapper/openeuler-swap rd.lvm.lv=openeuler/root rd.lvm.lv=openeuler/swap cgroup_disable=files apparmor=0 crashkernel=512M” 改成这样,然后重装就ok了