安装的k8s 的集群 出现的问题怎么诊断?
master 和 2个节点 都报这个
Jjoey_chenK零S
看起来像是网络连通性问题,kubeadmin安装的么?
使用 kk 安装,也就是 kubeadm 安装的
现在 看起来 上面的报错 没什么影响,但 coredns 有 问题
kubectl logs -f coredns-6b55b6764d-4dkqz -n kube-system
kubectl logs -f nodelocaldns-4lzlx -n kube-system
yuswiftK零S
coredns超时了10.10.10.2是不是你的host里面/etc/resolv.conf里面的nameserver
yuswiftK零S
去你的虚拟机上ping 10.10.10.2
telnet 10.10.10.2 53
如果不通 说明这个dnserver有问题 把它从nameserver里面删掉 可以临时规避这个问题
从虚机是 可以 telnet 的
telnet 10.10.10.2 53
Trying 10.10.10.2…
Connected to 10.10.10.2.
Escape character is ‘^]’.
RolandMa1986K零S
- 已编辑
tscswcn 能不能提供一下kube-proxy 的log kubectl -n kube-system logs kube-proxy-xxx
以及ipvs ipvsadm -Ln
- 已编辑
yuswift 我3个虚机机 都 可以
[root@kubesphere ~]# telnet 10.10.10.2 53
Trying 10.10.10.2…
Connected to 10.10.10.2.
Escape character is ‘^]’.
Connection closed by foreign host.
[root@worker1 ~]# telnet 10.10.10.2 53
Trying 10.10.10.2…
Connected to 10.10.10.2.
Escape character is ‘^]’.
Connection closed by foreign host.
[root@worker1 ~]#
[root@worker2 ~]# telnet 10.10.10.2 53
Trying 10.10.10.2…
Connected to 10.10.10.2.
Escape character is ‘^]’.
[root@worker2 ~]# nslookup kubesphere
Server: 10.10.10.2
Address: 10.10.10.2#53
** server can’t find kubesphere: NXDOMAIN
[root@worker2 ~]# nslookup 10.10.10.104
** server can’t find 104.10.10.10.in-addr.arpa.: NXDOMAIN
[root@worker2 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.10.10.104 kubesphere
10.10.10.106 worker1 worker1.localdomain
10.10.10.108 worker2 woker2.localdomain
10.10.10.104 kubesphere.localdomain.cluster.local kubesphere.localdomain
10.10.10.104 lb.kubesphere.local
10.10.10.104 blockdeviceclaims.openebs.io
[root@worker2 ~]# ping www.baidu.com
PING www.a.shifen.com (220.181.38.149) 56(84) bytes of data.
64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=1 ttl=128 time=9.55 ms
64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=2 ttl=128 time=9.04 ms
64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=3 ttl=128 time=7.49 ms
64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=4 ttl=128 time=8.28 ms
64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=5 ttl=128 time=7.30 ms
C
— www.a.shifen.com ping statistics —
5 packets transmitted, 5 received, 0% packet loss, time 4635ms
rtt min/avg/max/mdev = 7.303/8.334/9.555/0.871 ms
[root@worker2 ~]#
主机名我是 写在/etc/hosts 文件里的
- 已编辑
我的coredns 的两个pods 都在 master 上, master 节点既是master,也是 worker
[root@kubesphere ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kubesphere.localdomain Ready master,worker 15d v1.18.6
worker1.localdomain Ready <none> 15d v1.18.6
worker2.localdomain Ready <none> 11d v1.18.6
[root@kubesphere ~]# kubectl get pods -A | grep coredbns
[root@kubesphere ~]# kubectl get pods -A | grep core
kube-system coredns-6b55b6764d-4dkqz 1/1 Running 2 15d
kube-system coredns-6b55b6764d-hwxj7 1/1 Running 2 15d
[root@kubesphere ~]#
这样 是不是有问题
RolandMa1986 kubectl -n kube-system logs
[root@kubesphere ~]# kubectl get pods -A | grep proxy
kube-system kube-proxy-24b5l 1/1 Running 19 16d
kube-system kube-proxy-6v4×6 1/1 Running 11 12d
kube-system kube-proxy-sgr9t 1/1 Running 14 16d
[root@kubesphere ~]# kubectl -n kube-system logs kube-proxy-24b5l
I1029 06:18:30.480495 1 node.go:136] Successfully retrieved node IP: 10.10.10.106
I1029 06:18:30.480573 1 server_others.go:259] Using ipvs Proxier.
I1029 06:18:30.480813 1 proxier.go:357] missing br-netfilter module or unset sysctl br-nf-call-iptables; proxy may not work as intended
I1029 06:18:30.481295 1 server.go:583] Version: v1.18.6
I1029 06:18:30.481912 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1029 06:18:30.483408 1 config.go:315] Starting service config controller
I1029 06:18:30.483436 1 shared_informer.go:223] Waiting for caches to sync for service config
I1029 06:18:30.483476 1 config.go:133] Starting endpoints config controller
I1029 06:18:30.483498 1 shared_informer.go:223] Waiting for caches to sync for endpoints config
I1029 06:18:30.583690 1 shared_informer.go:230] Caches are synced for endpoints config
I1029 06:18:30.583791 1 shared_informer.go:230] Caches are synced for service config
I1029 06:19:30.482637 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.39:53
I1029 06:19:30.482717 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.31:53
[root@kubesphere ~]# kubectl -n kube-system logs kube-proxy-6v4×6
I1029 06:18:33.271768 1 node.go:136] Successfully retrieved node IP: 10.10.10.108
I1029 06:18:33.271850 1 server_others.go:259] Using ipvs Proxier.
I1029 06:18:33.272574 1 server.go:583] Version: v1.18.6
I1029 06:18:33.273585 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1029 06:18:33.280392 1 config.go:315] Starting service config controller
I1029 06:18:33.280634 1 shared_informer.go:223] Waiting for caches to sync for service config
I1029 06:18:33.280749 1 config.go:133] Starting endpoints config controller
I1029 06:18:33.280899 1 shared_informer.go:223] Waiting for caches to sync for endpoints config
I1029 06:18:33.380933 1 shared_informer.go:230] Caches are synced for service config
I1029 06:18:33.381115 1 shared_informer.go:230] Caches are synced for endpoints config
I1029 06:20:33.277535 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.39:53
I1029 06:20:33.279166 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.31:53
I1029 06:22:33.280526 1 graceful_termination.go:93] lw: remote out of the list: 10.233.60.1:80/TCP/10.233.64.78:9090
I1029 06:24:33.282055 1 graceful_termination.go:93] lw: remote out of the list: 10.233.60.1:80/TCP/10.233.64.76:9090
[root@kubesphere ~]# kubectl -n kube-system logs kube-proxy-sgr9t
I1029 07:36:58.762707 1 node.go:136] Successfully retrieved node IP: 10.10.10.104
I1029 07:36:58.763056 1 server_others.go:259] Using ipvs Proxier.
I1029 07:36:58.763872 1 server.go:583] Version: v1.18.6
I1029 07:36:58.764425 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1029 07:36:58.769403 1 config.go:133] Starting endpoints config controller
I1029 07:36:58.769445 1 shared_informer.go:223] Waiting for caches to sync for endpoints config
I1029 07:36:58.769682 1 config.go:315] Starting service config controller
I1029 07:36:58.769689 1 shared_informer.go:223] Waiting for caches to sync for service config
I1029 07:36:58.871107 1 shared_informer.go:230] Caches are synced for service config
I1029 07:36:58.871188 1 shared_informer.go:230] Caches are synced for endpoints config
I1029 07:38:58.765417 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.39:53
I1029 07:38:58.765514 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.31:53
I1031 01:11:08.365047 1 trace.go:116] Trace[1804000238]: “iptables save” (started: 2020-10-31 01:11:01.197209563 +0000 UTC m=+81144.265636904) (total time: 2.966201505s):
Trace[1804000238]: [2.966201505s] [2.966201505s] END
[root@kubesphere ~]#
[root@kubesphere ~]# ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddressort Scheduler Flags
-> RemoteAddressort Forward Weight ActiveConn InActConn
TCP 169.254.25.10:30880 rr
-> 10.233.66.36:8000 Masq 1 0 0
TCP 169.254.25.10:32567 rr
-> 10.233.66.41:80 Masq 1 0 0
TCP 172.17.0.1:30880 rr
-> 10.233.66.36:8000 Masq 1 0 0
TCP 10.10.10.104:30880 rr
-> 10.233.66.36:8000 Masq 1 0 0
TCP 10.10.10.104:32567 rr
-> 10.233.66.41:80 Masq 1 0 0
TCP 10.233.0.1:443 rr
-> 10.10.10.104:6443 Masq 1 62 0
TCP 10.233.0.3:53 rr
-> 10.233.64.58:53 Masq 1 0 4
-> 10.233.64.70:53 Masq 1 0 4
TCP 10.233.0.3:9153 rr
-> 10.233.64.58:9153 Masq 1 0 0
-> 10.233.64.70:9153 Masq 1 0 0
TCP 10.233.2.130:6379 rr
-> 10.233.64.71:6379 Masq 1 0 0
TCP 10.233.7.20:80 rr
-> 10.233.66.41:80 Masq 1 0 0
TCP 10.233.8.12:443 rr
-> 10.233.64.65:8443 Masq 1 0 0
TCP 10.233.13.76:9093 rr persistent 10800
-> 10.233.64.60:9093 Masq 1 0 0
TCP 10.233.15.225:8443 rr
-> 10.233.64.75:8443 Masq 1 0 0
TCP 10.233.24.157:443 rr
-> 10.10.10.108:4443 Masq 1 2 0
TCP 10.233.27.144:19093 rr
-> 10.233.64.61:19093 Masq 1 0 0
TCP 10.233.35.97:443 rr
-> 10.233.64.80:8443 Masq 1 0 0
TCP 10.233.35.122:80 rr
-> 10.233.66.36:8000 Masq 1 0 0
TCP 10.233.40.123:9090 rr persistent 10800
-> 10.233.64.55:9090 Masq 1 0 0
TCP 10.233.47.113:80 rr
-> 10.233.64.62:8080 Masq 1 0 0
TCP 10.233.49.67:5656 rr
-> 10.233.64.72:5656 Masq 1 0 0
TCP 10.233.60.1:80 rr
-> 10.233.64.79:9090 Masq 1 0 0
TCP 10.233.64.0:30880 rr
-> 10.233.66.36:8000 Masq 1 0 0
TCP 10.233.64.0:32567 rr
-> 10.233.66.41:80 Masq 1 0 0
TCP 10.233.64.1:30880 rr
-> 10.233.66.36:8000 Masq 1 0 0
TCP 10.233.64.1:32567 rr
-> 10.233.66.41:80 Masq 1 0 0
TCP 127.0.0.1:30880 rr
-> 10.233.66.36:8000 Masq 1 0 0
TCP 127.0.0.1:32567 rr
-> 10.233.66.41:80 Masq 1 0 0
TCP 172.17.0.1:32567 rr
-> 10.233.66.41:80 Masq 1 0 0
UDP 10.233.0.3:53 rr
-> 10.233.64.58:53 Masq 1 0 0
-> 10.233.64.70:53 Masq 1 0 0
[root@kubesphere ~]#
RolandMa1986 再帮我看看
再帮我看下
RolandMa1986K零S
tscswcn “ipvs rr udp 10.133.0.3 53 no destination available” ,这个不是影响环境的因素,要消除这个显示,在机器上执行:dmesg -n 1就不会显示。
目前您的集群中DNS查询还有超时吗?
RolandMa1986 还有
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:58402->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:48861->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:33260->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:53321->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:56475->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:57477->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:45194->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:50980->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:41519->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:46032->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:53543->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:50189->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:49052->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:40430->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:40941->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:34961->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:33067->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:57059->10.10.10.2:53: i/o timeout
[ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:54908->10.10.10.2:53: i/o timeout
10.0.0.2 是 vmware 提供的,是不是应该 把它删掉
该问题 最新进展:
我在 coredns 的 pods 里放了1个busybox 容器 ,进到busybox 里发现不同ping 到 虚机
我给 coredns 的pod 挂了1个 busybox的边车,进一步 诊断 发现 coredns的 pod 缺少 路由
/ # ip route
default via 10.233.66.1 dev eth0
10.233.66.0/24 dev eth0 scope link src 10.233.66.44
10.244.0.0/16 via 10.233.66.1 dev eth0
/ # ping 10.10.10.4
PING 10.10.10.4 (10.10.10.4): 56 data bytes
C
— 10.10.10.4 ping statistics —
6 packets transmitted, 0 packets received, 100% packet loss
/ # ping 10.10.10.6
PING 10.10.10.6 (10.10.10.6): 56 data bytes
C
— 10.10.10.6 ping statistics —
2 packets transmitted, 0 packets received, 100% packet loss
/ # ping 10.10.10.8
PING 10.10.10.8 (10.10.10.8): 56 data bytes
C
— 10.10.10.8 ping statistics —
4 packets transmitted, 0 packets received, 100% packet loss
/ # ping 10.10.10.104
PING 10.10.10.104 (10.10.10.104): 56 data bytes
C
— 10.10.10.104 ping statistics —
2 packets transmitted, 0 packets received, 100% packet loss
/ # ping 10.10.10.106
PING 10.10.10.106 (10.10.10.106): 56 data bytes
C
— 10.10.10.106 ping statistics —
4 packets transmitted, 0 packets received, 100% packet loss
/ # ping 10.10.10.108
PING 10.10.10.108 (10.10.10.108): 56 data bytes
64 bytes from 10.10.10.108: seq=0 ttl=64 time=0.158 ms
64 bytes from 10.10.10.108: seq=1 ttl=64 time=0.431 ms
64 bytes from 10.10.10.108: seq=2 ttl=64 time=0.118 ms
C
所以 问题应该是 在这里
ip route add 10.10.10.0/24 proto kernel scope link src 10.10.10.104 metric 100
但 容器 这条问题报错,我查了网上一些帖子说 要啊修改 内核,我就不知道怎么弄了
/ # ip route add 10.10.10.0/24 proto kernel scope link src 10.10.10.104 metric 100
ip: RTNETLINK answers: Operation not permitted
/ #
- 已编辑
此问题,已经解决,
docker service 里 添加
–bip
–ip-masq
–mtu