• 安装部署
  • 安装的k8s 的集群 出现的问题怎么诊断?

去你的虚拟机上ping 10.10.10.2
telnet 10.10.10.2 53
如果不通 说明这个dnserver有问题 把它从nameserver里面删掉 可以临时规避这个问题

从虚机是 可以 telnet 的

telnet 10.10.10.2 53
Trying 10.10.10.2…
Connected to 10.10.10.2.
Escape character is ‘^]’.

    tscswcn 能不能提供一下kube-proxy 的log kubectl -n kube-system logs kube-proxy-xxx
    以及ipvs ipvsadm -Ln

      yuswift 我3个虚机机 都 可以
      [root@kubesphere ~]# telnet 10.10.10.2 53
      Trying 10.10.10.2…
      Connected to 10.10.10.2.
      Escape character is ‘^]’.
      Connection closed by foreign host.

      [root@worker1 ~]# telnet 10.10.10.2 53
      Trying 10.10.10.2…
      Connected to 10.10.10.2.
      Escape character is ‘^]’.
      Connection closed by foreign host.
      [root@worker1 ~]#

      [root@worker2 ~]# telnet 10.10.10.2 53
      Trying 10.10.10.2…
      Connected to 10.10.10.2.
      Escape character is ‘^]’.

      [root@worker2 ~]# nslookup kubesphere
      Server: 10.10.10.2
      Address: 10.10.10.2#53

      ** server can’t find kubesphere: NXDOMAIN

      [root@worker2 ~]# nslookup 10.10.10.104
      ** server can’t find 104.10.10.10.in-addr.arpa.: NXDOMAIN

      [root@worker2 ~]# cat /etc/hosts
      127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
      ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
      10.10.10.104 kubesphere
      10.10.10.106 worker1 worker1.localdomain
      10.10.10.108 worker2 woker2.localdomain
      10.10.10.104 kubesphere.localdomain.cluster.local kubesphere.localdomain
      10.10.10.104 lb.kubesphere.local
      10.10.10.104 blockdeviceclaims.openebs.io

      [root@worker2 ~]# ping www.baidu.com
      PING www.a.shifen.com (220.181.38.149) 56(84) bytes of data.
      64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=1 ttl=128 time=9.55 ms
      64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=2 ttl=128 time=9.04 ms
      64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=3 ttl=128 time=7.49 ms
      64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=4 ttl=128 time=8.28 ms
      64 bytes from 220.181.38.149 (220.181.38.149): icmp_seq=5 ttl=128 time=7.30 ms
      C
      — www.a.shifen.com ping statistics —
      5 packets transmitted, 5 received, 0% packet loss, time 4635ms
      rtt min/avg/max/mdev = 7.303/8.334/9.555/0.871 ms
      [root@worker2 ~]#

      主机名我是 写在/etc/hosts 文件里的

      我的coredns 的两个pods 都在 master 上, master 节点既是master,也是 worker
      [root@kubesphere ~]# kubectl get nodes
      NAME STATUS ROLES AGE VERSION
      kubesphere.localdomain Ready master,worker 15d v1.18.6
      worker1.localdomain Ready <none> 15d v1.18.6
      worker2.localdomain Ready <none> 11d v1.18.6
      [root@kubesphere ~]# kubectl get pods -A | grep coredbns
      [root@kubesphere ~]# kubectl get pods -A | grep core
      kube-system coredns-6b55b6764d-4dkqz 1/1 Running 2 15d
      kube-system coredns-6b55b6764d-hwxj7 1/1 Running 2 15d
      [root@kubesphere ~]#

      这样 是不是有问题

      RolandMa1986 kubectl -n kube-system logs

      [root@kubesphere ~]# kubectl get pods -A | grep proxy
      kube-system kube-proxy-24b5l 1/1 Running 19 16d
      kube-system kube-proxy-6v4×6 1/1 Running 11 12d
      kube-system kube-proxy-sgr9t 1/1 Running 14 16d
      [root@kubesphere ~]# kubectl -n kube-system logs kube-proxy-24b5l
      I1029 06:18:30.480495 1 node.go:136] Successfully retrieved node IP: 10.10.10.106
      I1029 06:18:30.480573 1 server_others.go:259] Using ipvs Proxier.
      I1029 06:18:30.480813 1 proxier.go:357] missing br-netfilter module or unset sysctl br-nf-call-iptables; proxy may not work as intended
      I1029 06:18:30.481295 1 server.go:583] Version: v1.18.6
      I1029 06:18:30.481912 1 conntrack.go:52] Setting nf_conntrack_max to 131072
      I1029 06:18:30.483408 1 config.go:315] Starting service config controller
      I1029 06:18:30.483436 1 shared_informer.go:223] Waiting for caches to sync for service config
      I1029 06:18:30.483476 1 config.go:133] Starting endpoints config controller
      I1029 06:18:30.483498 1 shared_informer.go:223] Waiting for caches to sync for endpoints config
      I1029 06:18:30.583690 1 shared_informer.go:230] Caches are synced for endpoints config
      I1029 06:18:30.583791 1 shared_informer.go:230] Caches are synced for service config
      I1029 06:19:30.482637 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.39:53
      I1029 06:19:30.482717 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.31:53
      [root@kubesphere ~]# kubectl -n kube-system logs kube-proxy-6v4×6
      I1029 06:18:33.271768 1 node.go:136] Successfully retrieved node IP: 10.10.10.108
      I1029 06:18:33.271850 1 server_others.go:259] Using ipvs Proxier.
      I1029 06:18:33.272574 1 server.go:583] Version: v1.18.6
      I1029 06:18:33.273585 1 conntrack.go:52] Setting nf_conntrack_max to 131072
      I1029 06:18:33.280392 1 config.go:315] Starting service config controller
      I1029 06:18:33.280634 1 shared_informer.go:223] Waiting for caches to sync for service config
      I1029 06:18:33.280749 1 config.go:133] Starting endpoints config controller
      I1029 06:18:33.280899 1 shared_informer.go:223] Waiting for caches to sync for endpoints config
      I1029 06:18:33.380933 1 shared_informer.go:230] Caches are synced for service config
      I1029 06:18:33.381115 1 shared_informer.go:230] Caches are synced for endpoints config
      I1029 06:20:33.277535 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.39:53
      I1029 06:20:33.279166 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.31:53
      I1029 06:22:33.280526 1 graceful_termination.go:93] lw: remote out of the list: 10.233.60.1:80/TCP/10.233.64.78:9090
      I1029 06:24:33.282055 1 graceful_termination.go:93] lw: remote out of the list: 10.233.60.1:80/TCP/10.233.64.76:9090
      [root@kubesphere ~]# kubectl -n kube-system logs kube-proxy-sgr9t
      I1029 07:36:58.762707 1 node.go:136] Successfully retrieved node IP: 10.10.10.104
      I1029 07:36:58.763056 1 server_others.go:259] Using ipvs Proxier.
      I1029 07:36:58.763872 1 server.go:583] Version: v1.18.6
      I1029 07:36:58.764425 1 conntrack.go:52] Setting nf_conntrack_max to 131072
      I1029 07:36:58.769403 1 config.go:133] Starting endpoints config controller
      I1029 07:36:58.769445 1 shared_informer.go:223] Waiting for caches to sync for endpoints config
      I1029 07:36:58.769682 1 config.go:315] Starting service config controller
      I1029 07:36:58.769689 1 shared_informer.go:223] Waiting for caches to sync for service config
      I1029 07:36:58.871107 1 shared_informer.go:230] Caches are synced for service config
      I1029 07:36:58.871188 1 shared_informer.go:230] Caches are synced for endpoints config
      I1029 07:38:58.765417 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.39:53
      I1029 07:38:58.765514 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.64.31:53
      I1031 01:11:08.365047 1 trace.go:116] Trace[1804000238]: “iptables save” (started: 2020-10-31 01:11:01.197209563 +0000 UTC m=+81144.265636904) (total time: 2.966201505s):
      Trace[1804000238]: [2.966201505s] [2.966201505s] END
      [root@kubesphere ~]#

      [root@kubesphere ~]# ipvsadm -Ln
      IP Virtual Server version 1.2.1 (size=4096)
      Prot LocalAddress😛ort Scheduler Flags
      -> RemoteAddress😛ort Forward Weight ActiveConn InActConn
      TCP 169.254.25.10:30880 rr
      -> 10.233.66.36:8000 Masq 1 0 0
      TCP 169.254.25.10:32567 rr
      -> 10.233.66.41:80 Masq 1 0 0
      TCP 172.17.0.1:30880 rr
      -> 10.233.66.36:8000 Masq 1 0 0
      TCP 10.10.10.104:30880 rr
      -> 10.233.66.36:8000 Masq 1 0 0
      TCP 10.10.10.104:32567 rr
      -> 10.233.66.41:80 Masq 1 0 0
      TCP 10.233.0.1:443 rr
      -> 10.10.10.104:6443 Masq 1 62 0
      TCP 10.233.0.3:53 rr
      -> 10.233.64.58:53 Masq 1 0 4
      -> 10.233.64.70:53 Masq 1 0 4
      TCP 10.233.0.3:9153 rr
      -> 10.233.64.58:9153 Masq 1 0 0
      -> 10.233.64.70:9153 Masq 1 0 0
      TCP 10.233.2.130:6379 rr
      -> 10.233.64.71:6379 Masq 1 0 0
      TCP 10.233.7.20:80 rr
      -> 10.233.66.41:80 Masq 1 0 0
      TCP 10.233.8.12:443 rr
      -> 10.233.64.65:8443 Masq 1 0 0
      TCP 10.233.13.76:9093 rr persistent 10800
      -> 10.233.64.60:9093 Masq 1 0 0
      TCP 10.233.15.225:8443 rr
      -> 10.233.64.75:8443 Masq 1 0 0
      TCP 10.233.24.157:443 rr
      -> 10.10.10.108:4443 Masq 1 2 0
      TCP 10.233.27.144:19093 rr
      -> 10.233.64.61:19093 Masq 1 0 0
      TCP 10.233.35.97:443 rr
      -> 10.233.64.80:8443 Masq 1 0 0
      TCP 10.233.35.122:80 rr
      -> 10.233.66.36:8000 Masq 1 0 0
      TCP 10.233.40.123:9090 rr persistent 10800
      -> 10.233.64.55:9090 Masq 1 0 0
      TCP 10.233.47.113:80 rr
      -> 10.233.64.62:8080 Masq 1 0 0
      TCP 10.233.49.67:5656 rr
      -> 10.233.64.72:5656 Masq 1 0 0
      TCP 10.233.60.1:80 rr
      -> 10.233.64.79:9090 Masq 1 0 0
      TCP 10.233.64.0:30880 rr
      -> 10.233.66.36:8000 Masq 1 0 0
      TCP 10.233.64.0:32567 rr
      -> 10.233.66.41:80 Masq 1 0 0
      TCP 10.233.64.1:30880 rr
      -> 10.233.66.36:8000 Masq 1 0 0
      TCP 10.233.64.1:32567 rr
      -> 10.233.66.41:80 Masq 1 0 0
      TCP 127.0.0.1:30880 rr
      -> 10.233.66.36:8000 Masq 1 0 0
      TCP 127.0.0.1:32567 rr
      -> 10.233.66.41:80 Masq 1 0 0
      TCP 172.17.0.1:32567 rr
      -> 10.233.66.41:80 Masq 1 0 0
      UDP 10.233.0.3:53 rr
      -> 10.233.64.58:53 Masq 1 0 0
      -> 10.233.64.70:53 Masq 1 0 0
      [root@kubesphere ~]#

      RolandMa1986 再帮我看看

      tscswcn “ipvs rr udp 10.133.0.3 53 no destination available” ,这个不是影响环境的因素,要消除这个显示,在机器上执行:dmesg -n 1就不会显示。

      目前您的集群中DNS查询还有超时吗?

        RolandMa1986 还有

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:58402->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:48861->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:33260->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:53321->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:56475->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:57477->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:45194->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:50980->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:41519->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:46032->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:53543->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:50189->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:49052->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:40430->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:40941->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:34961->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:33067->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:57059->10.10.10.2:53: i/o timeout

        [ERROR] plugin/errors: 2 . NS: read udp 10.233.64.70:54908->10.10.10.2:53: i/o timeout

        10.0.0.2 是 vmware 提供的,是不是应该 把它删掉

        • Jeff 回复了此帖

          tscswcn 主机上不能访问的dns全都去掉,不然影响coredns是用

          该问题 最新进展:

          我在 coredns 的 pods 里放了1个busybox 容器 ,进到busybox 里发现不同ping 到 虚机

          我给 coredns 的pod 挂了1个 busybox的边车,进一步 诊断 发现 coredns的 pod 缺少 路由

          / # ip route
          default via 10.233.66.1 dev eth0
          10.233.66.0/24 dev eth0 scope link src 10.233.66.44
          10.244.0.0/16 via 10.233.66.1 dev eth0
          / # ping 10.10.10.4
          PING 10.10.10.4 (10.10.10.4): 56 data bytes
          C
          — 10.10.10.4 ping statistics —
          6 packets transmitted, 0 packets received, 100% packet loss
          / # ping 10.10.10.6
          PING 10.10.10.6 (10.10.10.6): 56 data bytes
          C
          — 10.10.10.6 ping statistics —
          2 packets transmitted, 0 packets received, 100% packet loss
          / # ping 10.10.10.8
          PING 10.10.10.8 (10.10.10.8): 56 data bytes
          C
          — 10.10.10.8 ping statistics —
          4 packets transmitted, 0 packets received, 100% packet loss
          / # ping 10.10.10.104
          PING 10.10.10.104 (10.10.10.104): 56 data bytes
          C
          — 10.10.10.104 ping statistics —
          2 packets transmitted, 0 packets received, 100% packet loss
          / # ping 10.10.10.106
          PING 10.10.10.106 (10.10.10.106): 56 data bytes
          C
          — 10.10.10.106 ping statistics —
          4 packets transmitted, 0 packets received, 100% packet loss
          / # ping 10.10.10.108
          PING 10.10.10.108 (10.10.10.108): 56 data bytes
          64 bytes from 10.10.10.108: seq=0 ttl=64 time=0.158 ms
          64 bytes from 10.10.10.108: seq=1 ttl=64 time=0.431 ms
          64 bytes from 10.10.10.108: seq=2 ttl=64 time=0.118 ms
          C

          所以 问题应该是 在这里

          ip route add 10.10.10.0/24 proto kernel scope link src 10.10.10.104 metric 100

          但 容器 这条问题报错,我查了网上一些帖子说 要啊修改 内核,我就不知道怎么弄了
          / # ip route add 10.10.10.0/24 proto kernel scope link src 10.10.10.104 metric 100
          ip: RTNETLINK answers: Operation not permitted
          / #

          此问题,已经解决,
          docker service 里 添加
          –bip
          –ip-masq
          –mtu