• Kubernetes
  • k8s其中一个节点一直处于NotReady

遇到服务器资源正常但是node一直NotReady,并提示资源不足;
Dvcm9S.png

[root@node3 ~]# kubectl describe node node3
Name:               node3
Roles:              worker
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=node3
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/worker=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                    node.alpha.kubernetes.io/ttl: 0
                    projectcalico.org/IPv4Address: 192.168.0.86/24
                    projectcalico.org/IPv4IPIPTunnelAddr: 10.233.92.0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Mon, 30 Nov 2020 18:01:50 +0800
Taints:             node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  node3
  AcquireTime:     <unset>
  RenewTime:       Mon, 07 Dec 2020 11:16:27 +0800
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 01 Dec 2020 12:04:33 +0800   Tue, 01 Dec 2020 12:04:33 +0800   CalicoIsUp                   Calico is running on this node
  MemoryPressure       False   Mon, 07 Dec 2020 11:16:28 +0800   Mon, 07 Dec 2020 10:53:24 +0800   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Mon, 07 Dec 2020 11:16:28 +0800   Mon, 07 Dec 2020 10:53:24 +0800   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Mon, 07 Dec 2020 11:16:28 +0800   Mon, 07 Dec 2020 10:53:24 +0800   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                False   Mon, 07 Dec 2020 11:16:28 +0800   Mon, 07 Dec 2020 10:53:24 +0800   KubeletNotReady              container runtime status check may not have completed yet
Addresses:
  InternalIP:  192.168.0.86
  Hostname:    node3
Capacity:
  cpu:                2
  ephemeral-storage:  41611416Ki
  hugepages-2Mi:      0
  memory:             3880504Ki
  pods:               110
Allocatable:
  cpu:                1600m
  ephemeral-storage:  41611416Ki
  hugepages-2Mi:      0
  memory:             3250666289
  pods:               110
System Info:
  Machine ID:                   315c4eb47188422094c1fdc39dab4e5f
  System UUID:                  BD214D56-09AB-CDF1-A49C-B561AE2B5816
  Boot ID:                      118b69e0-39cd-43cb-a140-404134f4e28c
  Kernel Version:               3.10.0-1160.6.1.el7.x86_64
  OS Image:                     CentOS Linux 7 (Core)
  Operating System:             linux
  Architecture:                 amd64
  Container Runtime Version:    docker://19.3.13
  Kubelet Version:              v1.18.6
  Kube-Proxy Version:           v1.18.6
PodCIDR:                        10.233.65.0/24
PodCIDRs:                       10.233.65.0/24
Non-terminated Pods:            (5 in total)
  Namespace                     Name                   CPU Requests  CPU Limits   Memory Requests  Memory Limits  AGE
  ---------                     ----                   ------------  ----------   ---------------  -------------  ---
  kube-system                   calico-node-4gqdc      250m (15%)    0 (0%)       0 (0%)           0 (0%)         6d17h
  kube-system                   kube-proxy-x6nkl       0 (0%)        0 (0%)       0 (0%)           0 (0%)         6d17h
  kube-system                   nodelocaldns-8ptbf     100m (6%)     0 (0%)       70Mi (2%)        170Mi (5%)     6d17h
  kube-system                   openebs-ndm-wd8lp      0 (0%)        0 (0%)       0 (0%)           0 (0%)         6d17h
  kubesphere-monitoring-system  node-exporter-gw497    112m (7%)     1300m (81%)  200Mi (6%)       500Mi (16%)    6d16h
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                462m (28%)  1300m (81%)
  memory             270Mi (8%)  670Mi (21%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type     Reason                   Age                From            Message
  ----     ------                   ----               ----            -------
  Normal   NodeHasSufficientMemory  60m                kubelet, node3  Node node3 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    60m                kubelet, node3  Node node3 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     60m                kubelet, node3  Node node3 status is now: NodeHasSufficientPID
  Normal   Starting                 60m                kubelet, node3  Starting kubelet.
  Normal   Starting                 60m                kubelet, node3  Starting kubelet.
  Normal   NodeHasNoDiskPressure    60m                kubelet, node3  Node node3 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientMemory  60m                kubelet, node3  Node node3 status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientPID     60m                kubelet, node3  Node node3 status is now: NodeHasSufficientPID

Ready False Mon, 07 Dec 2020 11:16:28 +0800 Mon, 07 Dec 2020 10:53:24 +0800 KubeletNotReady container runtime status check may not have completed yet

使用
journalctl -xef -u kubelet
journalctl -xef -u docker

看一下日志

    andrew_li

    12月 07 15:16:15 node3 systemd[1]: kubelet.service holdoff time over, scheduling restart.
    12月 07 15:16:15 node3 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
    -- Subject: Unit kubelet.service has finished shutting down
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    -- 
    -- Unit kubelet.service has finished shutting down.
    12月 07 15:16:15 node3 systemd[1]: Started kubelet: The Kubernetes Node Agent.
    -- Subject: Unit kubelet.service has finished start-up
    -- Defined-By: systemd
    -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
    -- 
    -- Unit kubelet.service has finished starting up.
    -- 
    -- The start-up result is done.
    12月 07 15:16:15 node3 kubelet[14756]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
    12月 07 15:16:15 node3 kubelet[14756]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
    12月 07 15:16:15 node3 kubelet[14756]: W1207 15:16:15.702970   14756 feature_gate.go:235] Setting GA feature gate CSINodeInfo=true. It will be removed in a future release.
    12月 07 15:16:15 node3 kubelet[14756]: W1207 15:16:15.703218   14756 feature_gate.go:235] Setting GA feature gate CSINodeInfo=true. It will be removed in a future release.
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.718067   14756 server.go:417] Version: v1.18.6
    12月 07 15:16:15 node3 kubelet[14756]: W1207 15:16:15.718228   14756 feature_gate.go:235] Setting GA feature gate CSINodeInfo=true. It will be removed in a future release.
    12月 07 15:16:15 node3 kubelet[14756]: W1207 15:16:15.718383   14756 feature_gate.go:235] Setting GA feature gate CSINodeInfo=true. It will be removed in a future release.
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.720297   14756 plugins.go:100] No cloud provider specified.
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.720347   14756 server.go:838] Client rotation is on, will bootstrap in background
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.759385   14756 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.762521   14756 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.933967   14756 server.go:647] --cgroups-per-qos enabled, but --cgroup-root was not specified.  defaulting to /
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.934805   14756 container_manager_linux.go:266] container manager verified user specified cgroup-root exists: []
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.934841   14756 container_manager_linux.go:271] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:262144000 scale:0} d:{Dec:<nil>} s:250Mi Format:BinarySI}] SystemReserved:map[cpu:{i:{value:200 scale:-3} d:{Dec:<nil>} s:200m Format:DecimalSI} memory:{i:{value:262144000 scale:0} d:{Dec:<nil>} s:250Mi Format:BinarySI}] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.935104   14756 topology_manager.go:126] [topologymanager] Creating topology manager with none policy
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.935133   14756 container_manager_linux.go:301] [topologymanager] Initializing Topology Manager with none policy
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.935150   14756 container_manager_linux.go:306] Creating device plugin manager: true
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.935335   14756 client.go:75] Connecting to docker on unix:///var/run/docker.sock
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.935369   14756 client.go:92] Start docker client with request timeout=2m0s
    12月 07 15:16:15 node3 kubelet[14756]: W1207 15:16:15.959626   14756 docker_service.go:561] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
    12月 07 15:16:15 node3 kubelet[14756]: I1207 15:16:15.959705   14756 docker_service.go:238] Hairpin mode set to "hairpin-veth"
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.065361   14756 docker_service.go:253] Docker cri networking managed by cni
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.099278   14756 docker_service.go:258] Docker Info: &{ID:ELYA:RGDA:NT3O:IYA7:QLCN:IBXB:QEKW:C2MQ:ITDJ:7VSA:NHI3:HXEB Containers:86 ContainersRunning:0 ContainersPaused:0 ContainersStopped:86 Images:176 Driver:overlay2 DriverStatus:[[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host ipvlan macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:22 OomKillDisable:true NGoroutines:35 SystemTime:2020-12-07T15:16:16.067098547+08:00 LoggingDriver:json-file CgroupDriver:systemd NEventsListener:0 KernelVersion:3.10.0-1160.6.1.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc000796540 NCPU:2 MemTotal:3973636096 GenericResources:[] DockerRootDir:/var/lib/docker HTTPProxy: HTTPSProxy: NoProxy: Name:node3 Labels:[] ExperimentalBuild:false ServerVersion:19.03.13 ClusterStore: ClusterAdvertise: Runtimes:map[runc:{Path:runc Args:[]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:8fba4e9a7d01810a393d5d25a3621dc101981175 Expected:8fba4e9a7d01810a393d5d25a3621dc101981175} RuncCommit:{ID:dc9208a3303feef5b3839f4323d9beb36df0a9dd Expected:dc9208a3303feef5b3839f4323d9beb36df0a9dd} InitCommit:{ID:fec3683 Expected:fec3683} SecurityOptions:[name=seccomp,profile=default] ProductLicense: Warnings:[]}
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.099493   14756 docker_service.go:271] Setting cgroupDriver to systemd
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.162953   14756 remote_runtime.go:59] parsed scheme: ""
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.163000   14756 remote_runtime.go:59] scheme "" not registered, fallback to default scheme
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.163077   14756 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/run/dockershim.sock  <nil> 0 <nil>}] <nil> <nil>}
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.163123   14756 clientconn.go:933] ClientConn switching balancer to "pick_first"
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.163336   14756 remote_image.go:50] parsed scheme: ""
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.163362   14756 remote_image.go:50] scheme "" not registered, fallback to default scheme
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.163394   14756 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/run/dockershim.sock  <nil> 0 <nil>}] <nil> <nil>}
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.163413   14756 clientconn.go:933] ClientConn switching balancer to "pick_first"
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.163500   14756 kubelet.go:292] Adding pod path: /etc/kubernetes/manifests
    12月 07 15:16:16 node3 kubelet[14756]: I1207 15:16:16.163600   14756 kubelet.go:317] Watching apiserver
    12月 07 15:16:22 node3 kubelet[14756]: E1207 15:16:22.402433   14756 aws_credentials.go:77] while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
    12月 07 15:16:22 node3 kubelet[14756]: For verbose messaging see aws.Config.CredentialsChainVerboseErrors
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.434283   14756 kuberuntime_manager.go:211] Container runtime docker initialized, version: 19.03.13, apiVersion: 1.40.0
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.435440   14756 server.go:1126] Started kubelet
    12月 07 15:16:22 node3 kubelet[14756]: E1207 15:16:22.435937   14756 kubelet.go:1306] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data in memory cache
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.437904   14756 server.go:145] Starting to listen on 0.0.0.0:10250
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.440497   14756 server.go:393] Adding debug handlers to kubelet server.
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.443933   14756 fs_resource_analyzer.go:64] Starting FS ResourceAnalyzer
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.453899   14756 volume_manager.go:265] Starting Kubelet Volume Manager
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.461072   14756 desired_state_of_world_populator.go:139] Desired state populator starts to run
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.529893   14756 status_manager.go:158] Starting to sync pod status with apiserver
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.529963   14756 kubelet.go:1822] Starting kubelet main sync loop.
    12月 07 15:16:22 node3 kubelet[14756]: E1207 15:16:22.530073   14756 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.555167   14756 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.555913   14756 kuberuntime_manager.go:978] updating runtime config through cri with podcidr 10.233.65.0/24
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.556311   14756 docker_service.go:353] docker cri received runtime config &RuntimeConfig{NetworkConfig:&NetworkConfig{PodCidr:10.233.65.0/24,},}
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.556599   14756 kubelet_network.go:77] Setting Pod CIDR:  -> 10.233.65.0/24
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.583167   14756 clientconn.go:106] parsed scheme: "unix"
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.583204   14756 clientconn.go:106] scheme "unix" not registered, fallback to default scheme
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.583386   14756 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{unix:///run/containerd/containerd.sock  <nil> 0 <nil>}] <nil> <nil>}
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.583413   14756 clientconn.go:933] ClientConn switching balancer to "pick_first"
    12月 07 15:16:22 node3 kubelet[14756]: W1207 15:16:22.623375   14756 cni.go:331] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "5434db553f2153a630b970b0bd9bbdf27ac9fe9cd352bfa360c6119ea31adcc5"
    12月 07 15:16:22 node3 kubelet[14756]: E1207 15:16:22.633120   14756 kubelet.go:1846] skipping pod synchronization - [container runtime status check may not have completed yet, PLEG is not healthy: pleg has yet to be successful]
    12月 07 15:16:22 node3 kubelet[14756]: W1207 15:16:22.775011   14756 docker_sandbox.go:400] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "coredns-6b55b6764d-4p6g7_kube-system": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "377f6fc108c93124e4602275ada68e27570e5f4053b4a4aefe46981a5bc449fe"
    12月 07 15:16:22 node3 kubelet[14756]: E1207 15:16:22.837098   14756 kubelet.go:1846] skipping pod synchronization - container runtime status check may not have completed yet
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.837670   14756 kubelet_node_status.go:70] Attempting to register node node3
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.906649   14756 kubelet_node_status.go:112] Node node3 was previously registered
    12月 07 15:16:22 node3 kubelet[14756]: I1207 15:16:22.906840   14756 kubelet_node_status.go:73] Successfully registered node node3
    12月 07 15:16:22 node3 kubelet[14756]: W1207 15:16:22.957558   14756 docker_sandbox.go:400] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "s2ioperator-0_kubesphere-devops-system": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "09cdf1601fc8d48f8cdfb91b7d5b6ada8784513b9b63761d3bf8c2421c077ef7"
    12月 07 15:16:22 node3 kubelet[14756]: W1207 15:16:22.980893   14756 docker_sandbox.go:400] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "uc-jenkins-update-center-cd9464fff-pwrpg_kubesphere-devops-system": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "fef01c976d27c4b9ab60c3140025abb7275307ab14294d575702f611d53d4371"
    12月 07 15:16:23 node3 kubelet[14756]: W1207 15:16:23.050089   14756 docker_sandbox.go:400] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "prometheus-k8s-0_kubesphere-monitoring-system": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "d84dc977732d06d56b67740b81b2b8d6a41338053cd7695653420d1dee15c13a"
    12月 07 15:16:23 node3 kubelet[14756]: F1207 15:16:23.071262   14756 kubelet.go:1384] Failed to start ContainerManager failed to build map of initial containers from runtime: no PodsandBox found with Id 'd0831a4288e261b5e3a5643f4a36bb2162282262fba671a682159494dfcd6f56'
    12月 07 15:16:23 node3 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
    12月 07 15:16:23 node3 systemd[1]: Unit kubelet.service entered failed state.
    12月 07 15:16:23 node3 systemd[1]: kubelet.service failed.

    andrew_li
    我重启过kubelet了 还是一样的,kubelet一直在重启,我现在打算给他删了重新安装试试

    看日志应该是 部分pod的信息没有找到, 你将所有的container 停掉, 然后删掉所有的container. 再重启kubelet

      andrew_li
      奇怪的是,docker ps没看到一个容器,连kube-proxy镜像都没,我重启docker也没反应

      7 个月 后
      3 个月 后
      4 个月 后

      手动停止kubelet服务

      systemctl stop kubelet

      删除所有异常退出的容器

      docker rm $(docker ps -a -q)

      后启动kubelet,kubelet正常运行

      商业产品与合作咨询