coredns pod始终是ContainerCreating

Ttaorz1 · 2021年1月25日

安装之后coredns pod状态一直是ContainerCreating

Warning FailedCreatePodSandBox 66s kubelet, test-03 Failed to create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container “ff9bb62199e4265edb42cf3eeaca47ab5aa7945d2c35089ed6907d91d1d6fa71” network for pod “coredns-74d59cc5c6-c9t2q”: networkPlugin cni failed to set up pod “coredns-74d59cc5c6-c9t2q_kube-system” network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/, failed to clean up sandbox container “ff9bb62199e4265edb42cf3eeaca47ab5aa7945d2c35089ed6907d91d1d6fa71” network for pod “coredns-74d59cc5c6-c9t2q”: networkPlugin cni failed to teardown pod “coredns-74d59cc5c6-c9t2q_kube-system” network: neither iptables nor ip6tables usable]
Normal SandboxChanged <invalid> (x7 over 66s) kubelet, test-03 Pod sandbox changed, it will be killed and re-created.

请问是coredns什么地方配置得不对吗

calico controller可以起来 calico-node起不了

Type Reason Age From Message

Warning Unhealthy 21m (x124 over 91m) kubelet, test-03 Liveness probe failed: calico/node is not ready: bird/confd is not live: exit status 1
Warning Unhealthy 6m51s (x7 over 81m) kubelet, test-03 Liveness probe failed: calico/node is not ready: Felix is not live: Get http://localhost:9099/liveness: dial tcp 127.0.0.1:9099: connect: connection refused
Warning BackOff 114s (x298 over 87m) kubelet, test-03 Back-off restarting failed container

yuswift · 2021年1月26日

看看calico日志为什么起不来

Ttaorz1 · 2021年1月27日

yuswift

kubectl -n kube-system describe pod calico-node-l6pb8 输出

kubectl -n kube-system describe pod calico-node-l6pb8
Name: calico-node-l6pb8
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: test-02/xxxx
Start Time: Mon, 25 Jan 2021 14:57:38 +0800
Labels: controller-revision-hash=6b95d9c9d
k8s-app=calico-node
pod-template-generation=1
Annotations: <none>
Status: Running
IP: xxxx
IPs:
IP: xxxx
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://8447493f8cd2c1cb5dd3a317fd72e33d53965b6545ce1202a8d38deaaf63c2b4
Image: calico/cni:v3.15.1
Image ID: docker-pullable://calico/cni@sha256:b86711626e68a5298542efc52e2bd3c64e212a635359b3a017ee0a8cd47b0c1e
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 25 Jan 2021 14:57:41 +0800
Finished: Mon, 25 Jan 2021 14:57:41 +0800
Ready: True
Restart Count: 0
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key ‘calico_backend’ of config map ‘calico-config’> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-7qx56 (ro)
install-cni:
Container ID: docker://2a1c8fd8552a3fd2e5d54b373795a090c4576785547b04444b07b058e58f5e34
Image: calico/cni:v3.15.1
Image ID: docker-pullable://calico/cni@sha256:b86711626e68a5298542efc52e2bd3c64e212a635359b3a017ee0a8cd47b0c1e
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 25 Jan 2021 14:57:42 +0800
Finished: Mon, 25 Jan 2021 14:57:42 +0800
Ready: True
Restart Count: 0
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key ‘cni_network_config’ of config map ‘calico-config’> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key ‘veth_mtu’ of config map ‘calico-config’> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-7qx56 (ro)
flexvol-driver:
Container ID: docker://c31299decc8b34a9e185a54da70fe2c7a8d8d9111e90ed0e4d39c3dec49c6dcd
Image: calico/pod2daemon-flexvol:v3.15.1
Image ID: docker-pullable://calico/pod2daemon-flexvol@sha256:c2c6bbe3e10d24a01d6f3fd5b446cce6cf3e37f943960263bf6e5c458ecdeb52
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 25 Jan 2021 14:57:43 +0800
Finished: Mon, 25 Jan 2021 14:57:43 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-7qx56 (ro)
Containers:
calico-node:
Container ID: docker://184431daef4cb50109efd4ed510909ec29b631b1078e3baeffc1dab8c5687653
Image: calico/node:v3.15.1
Image ID: docker-pullable://calico/node@sha256:b386769a293d180cb6ee208c8594030128a0810b286a93ae897a231ef247afa8
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Wed, 27 Jan 2021 10:34:35 +0800
Finished: Wed, 27 Jan 2021 10:35:45 +0800
Ready: False
Restart Count: 718
Requests:
cpu: 250m
Liveness: exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key ‘calico_backend’ of config map ‘calico-config’> Optional: false
CLUSTER_TYPE: k8s,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
CALICO_IPV4POOL_VXLAN: Never
FELIX_IPINIPMTU: <set to the key ‘veth_mtu’ of config map ‘calico-config’> Optional: false
FELIX_VXLANMTU: <set to the key ‘veth_mtu’ of config map ‘calico-config’> Optional: false
FELIX_WIREGUARDMTU: <set to the key ‘veth_mtu’ of config map ‘calico-config’> Optional: false
CALICO_IPV4POOL_CIDR: 10.233.64.0/18
CALICO_IPV4POOL_BLOCK_SIZE: 24
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-7qx56 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent_uds
HostPathType: DirectoryOrCreate
calico-node-token-7qx56:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-7qx56
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message

Warning Unhealthy 23m (x4188 over 42h) kubelet, test-02 Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory
Warning Unhealthy 8m33s (x4015 over 42h) kubelet, test-02 Liveness probe failed: calico/node is not ready: bird/confd is not live: exit status 1
Warning BackOff 3m36s (x8578 over 42h) kubelet, test-02 Back-off restarting failed container

kubectl -n kube-system logs calico-node-l6pb8 输出

kubectl -n kube-system logs calico-node-l6pb8
2021-01-27 02:33:27.569 [INFO][9] startup/startup.go 299: Early log level set to info
2021-01-27 02:33:27.569 [INFO][9] startup/startup.go 315: Using NODENAME environment for node name
2021-01-27 02:33:27.569 [INFO][9] startup/startup.go 327: Determined node name: sh-pd-k8s-test-02
2021-01-27 02:33:27.570 [INFO][9] startup/startup.go 359: Checking datastore connection
2021-01-27 02:33:57.570 [INFO][9] startup/startup.go 374: Hit error connecting to datastore - retry error=Get https://xxx.xx.0.1:443/api/v1/nodes/foo: dial tcp xxx.xx.0.1:443: i/o timeout

3.kubectl -n kube-system get cm coredns -o yaml 输出

apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: “2021-01-25T06:56:23Z”
name: coredns
namespace: kube-system
resourceVersion: “177”
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: d05d0087-b837-465e-b20e-9a94f458f240

kubectl -n kube-system describe pod nodelocaldns-8kmnl输出

kubectl -n kube-system describe pod nodelocaldns-8kmnl
Name: nodelocaldns-8kmnl
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: test-02/xxxx
Start Time: Mon, 25 Jan 2021 16:21:08 +0800
Labels: controller-revision-hash=74c6d6495
k8s-app=nodelocaldns
pod-template-generation=1
Annotations: prometheus.io/port: 9253
prometheus.io/scrape: true
Status: Running
IP: 10.26.21.223
IPs:
IP: 10.26.21.223
Controlled By: DaemonSet/nodelocaldns
Containers:
node-cache:
Container ID: docker://370ef9b8802643c1574772a944833dc5b68e79cdb67ac657aa5b52bc2922ca16
Image: kubesphere/k8s-dns-node-cache:1.15.12
Image ID: docker-pullable://kubesphere/k8s-dns-node-cache@sha256:3b55377cd3b8098a79dc3f276cc542a681e3f2b71554addac9a603cc65e4829e
Ports: 53/UDP, 53/TCP, 9253/TCP
Host Ports: 53/UDP, 53/TCP, 9253/TCP
Args:
-localip
xxxxx
-conf
/etc/coredns/Corefile
-upstreamsvc
coredns
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 27 Jan 2021 10:52:05 +0800
Finished: Wed, 27 Jan 2021 10:52:05 +0800
Ready: False
Restart Count: 503
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://xxxxx:9254/health delay=0s timeout=5s period=10s #success=1 #failure=10
Readiness: http-get http://xxxxx:9254/health delay=0s timeout=5s period=10s #success=1 #failure=10
Environment: <none>
Mounts:
/etc/coredns from config-volume (rw)
/run/xtables.lock from xtables-lock (rw)
/var/run/secrets/kubernetes.io/serviceaccount from nodelocaldns-token-cc2vv (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: nodelocaldns
Optional: false
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
nodelocaldns-token-cc2vv:
Type: Secret (a volume populated by a Secret)
SecretName: nodelocaldns-token-cc2vv
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/network-unavailable:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/pid-pressure:NoSchedule
node.kubernetes.io/unreachable:NoExecute
node.kubernetes.io/unschedulable:NoSchedule
Events:
Type Reason Age From Message

Normal Pulled 54m (x494 over 42h) kubelet, sh-pd-k8s-test-02 Container image “kubesphere/k8s-dns-node-cache:1.15.12” already present on machine
Warning BackOff 4m44s (x12075 over 42h) kubelet, sh-pd-k8s-test-02 Back-off restarting failed container

RolandMa1986 · 2021年1月27日

“Hit error connecting to datastore - retry error=Get https://xxx.xx.0.1:443/api/v1/nodes/foo: dial tcp xxx.xx.0.1:443: i/o timeout”
有没有关闭防火墙？

Ttaorz1 · 2021年1月27日

RolandMa1986 防火墙已经关了

RolandMa1986 · 2021年1月27日

taorz1 如果用的云环境，也要关闭网络防火墙。物理主机的话，检查一下是否有多块网卡。

RolandMa1986 · 2021年1月27日

RolandMa1986 你的主机的网段是10.26.21.223？这个掩码是多少位的，检查一下是否于Calico 的10.233.64.0/18有重叠。

Ttaorz1 · 2021年1月29日

RolandMa1986

PodsCIDR 10.233.64.0/18
ServiceCIDR 10.233.0.0/18
部署在虚拟机上，虚拟地址网段10.26.21.0/24 之间没有重叠

网络配置如下，这个有问题吗

cali70cede9b535: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1440
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

docker0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 172.17.0.1 netmask 255.255.0.0 broadcast 172.17.255.255
ether 02:42:ee:53:c8:18 txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.26.21.xxx netmask 255.255.255.0 broadcast 10.26.21.255
ether 52:54:00:ae:28:d4 txqueuelen 1000 (Ethernet)
RX packets 8399602 bytes 2465523254 (2.2 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 9063819 bytes 3462227865 (3.2 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 124288873 bytes 78055595643 (72.6 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 124288873 bytes 78055595643 (72.6 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

tunl0: flags=193<UP,RUNNING,NOARP> mtu 1440
inet 10.233.105.0 netmask 255.255.255.255
tunnel txqueuelen 1000 (IPIP Tunnel)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

RolandMa1986 · 2021年1月29日

taorz1 配置看起来没有问题，但是你node 上calico 建立的 tunl0 没有任何数据传输。所以pod之间连接失败。你可以将环境登录信息发送到官网邮箱，帮助你排查问题。

Ttaorz1 · 2021年1月29日

RolandMa1986 这是内部网络还不太方便，可以帮忙提供思路，我来进行信息收集

附带一下路由信息

route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.26.21.1 0.0.0.0 UG 0 0 0 eth0
10.26.21.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
10.233.105.0 0.0.0.0 255.255.255.0 U 0 0 0 *
10.233.105.1 0.0.0.0 255.255.255.255 UH 0 0 0 cali70cede9b535
169.254.0.0 0.0.0.0 255.255.0.0 U 1002 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0

Ttaorz1 · 2021年1月29日

RolandMa1986 另外请教个问题 tunl0是做什么的跨主机通信是吗？在网络拓扑上处于一个什么位置？

RolandMa1986 · 2021年1月29日

taorz1 可以百度一下啊-> calico IPIP 模式：https://blog.csdn.net/u010771890/article/details/103224004

Ttaorz1 · 2021年1月29日

RolandMa1986 ok 多谢

Ttaorz1 · 2021年1月29日

RolandMa1986 目前这个cluster是安装在3个vm上，一个master, 两个node.

目前从calico的架构拓扑图来看，两个node上好像是缺少Felix,bird，master上是有的. 安装过程是在sample yaml上改动集群信息，通过kk安装的，os 版本是CentOS Linux release 7.9.2009 (Core)。
这种状况正常吗？如果不正常，怎样才能在两个node上也起来felix和bird？

另外还有一个问题，我看有的ip地址是ipv6, 这个会有影响吗？

RolandMa1986 · 2021年1月29日

taorz1 排查步骤参考一下文章:
projectcalico/calico#3092
https://docs.projectcalico.org/getting-started/kubernetes/requirements
常见的问题一般就是：

防火墙未关闭
机器有多个网卡
网段冲突
缺少系统组件/内核原因
如果都不是，你可以尝试更为bgp模式，或者使用flannal

coredns pod始终是ContainerCreating

Ttaorz1

yuswiftK零S

Ttaorz1

RolandMa1986K零S

Ttaorz1

RolandMa1986K零S

RolandMa1986K零S

Ttaorz1

RolandMa1986K零S

Ttaorz1

Ttaorz1

RolandMa1986K零S

Ttaorz1

Ttaorz1

RolandMa1986K零S