Jeff 谢谢你的回复,这边看kubectl get apiservice | grep metrics 失败状态

v1beta1.metrics.k8s.io                 kube-system/metrics-server   False (FailedDiscoveryCheck)   22d

上面我日志中已经有个metrics-server起来但是kube-apiserver还是去试图连接那个丢失节点上的pod 10.244.235.186当我kill kube-apiserver之后他就能正确连接10.244.180.55
早期使用udp出现过这样的错误,但是使用conntrack -D 清除连接缓存就可以生成新的连接,但是这次即使使用了conntrack -D 还是不行,还是会重新指向错误的podIP 10.244.235.186 感觉有地方记忆了
https://github.com/kubernetes/kubernetes/issues/59368?from=singlemessage

metrics-server-8b7689b66-xm6mf            1/1     Running    0          36s     10.244.180.55    master2   <none>           <none>
metrics-server-8b7689b66-z9hk9            1/1     Unknown    0          3m58s   10.244.235.186  worker1   <none>           <none>
conntrack -L | grep 10.101.186.48
tcp      6 278 ESTABLISHED src=10.101.186.48 dst=10.101.186.48 sport=45842 dport=443 src=10.244.235.186 dst=192.168.210.71 sport=443 dport=19158 [ASSURED] mark=0 use=1
tcp      6 298 ESTABLISHED src=10.101.186.48 dst=10.101.186.48 sport=45820 dport=443 src=10.244.235.186 dst=192.168.210.71 sport=443 dport=15276 [ASSURED] mark=0 use=2

我试了如下的方法是有效的,修改了内核从net.ipv4.tcp_retries2=15到net.ipv4.tcp_retries2=1,断电的情况下载1min释放之后可以指定到10.244.180.55

https://blog.csdn.net/gao1738/article/details/42839697

  • Jeff 回复了此帖

    guoh1988 如果是生产环境,不建议这么设置,建议你可以ipvs规则是否在master节点上没有更新

      Jeff 我这边使用的就是ipvs模式,不知道你说的IPvs规则指的是不是kube-prox的ipvs模式,上边的说明udp bug跟这次还是有区别,UDP只要使用conntrack -D 清除连接缓存就可以生成新的连接,kube-apserver 链接metrics-server 即使conntrack -D 清除重新生成还是错误的,也不能解决,只能修改内核net.ipv4.tcp_retries2或者重启kube-apiserver

      kubectl logs -f kube-proxy-7wkbr  -n kube-system
      I1123 01:40:45.112593       1 node.go:135] Successfully retrieved node IP: 192.168.210.71
      I1123 01:40:45.112639       1 server_others.go:177] Using ipvs Proxier.
      W1123 01:40:45.112929       1 proxier.go:415] IPVS scheduler not specified, use rr by default
      I1123 01:40:45.113153       1 server.go:529] Version: v1.16.10
      I1123 01:40:45.113560       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
      I1123 01:40:45.113585       1 conntrack.go:52] Setting nf_conntrack_max to 131072
      I1123 01:40:45.113628       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
      I1123 01:40:45.113650       1 conntrack.go:100] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
      I1123 01:40:45.115833       1 config.go:131] Starting endpoints config controller
      I1123 01:40:45.115878       1 config.go:313] Starting service config controller
      I1123 01:40:45.115894       1 shared_informer.go:197] Waiting for caches to sync for endpoints config
      I1123 01:40:45.115897       1 shared_informer.go:197] Waiting for caches to sync for service config
      I1123 01:40:45.216030       1 shared_informer.go:204] Caches are synced for endpoints config 
      I1123 01:40:45.216033       1 shared_informer.go:204] Caches are synced for service config 
      • Jeff 回复了此帖

        guoh1988 是的,出现pod重启,apiservice仍然没有正常的情况下,看下master节点上ipvs规则,ipvsadm -Ln 看下metrics-server ip对应的pod ip是否正确。ipvsadm -lnc 看下是否有很多sync_wait的链接

        谢谢你的回复
        正常情况下

        kubectl get pods -n kube-system -o wide
        NAME                                      READY   STATUS    RESTARTS   AGE    IP               NODE      NOMINATED NODE   READINESS GATES
        metrics-server-8b7689b66-rvrrl            1/1     Running   0          8m7s   10.244.235.137   worker1   <none>           <none>
        
        ipvsadm -Ln
        TCP  10.101.186.48:443 rr
          -> 10.244.235.137:443           Masq    1      2          0   
        
        ipvsadm -lnc  | grep 10.101.186.48
        TCP 14:37  ESTABLISHED 10.101.186.48:56312 10.101.186.48:443  10.244.235.137:443
        TCP 14:55  ESTABLISHED 10.101.186.48:56328 10.101.186.48:443  10.244.235.137:443

        异常情况下

        ipvsadm -Ln
        TCP  10.101.186.48:443 rr
          -> 10.244.180.39:443            Masq    1      0          0         
          -> 10.244.235.137:443           Masq    0      2          0 

        一个瞬间转入CLOSE_WAIT,还有一个在ESTABLISHED ,看前面的时间和15min吻合

        ipvsadm -lnc  | grep 10.101.186.48
        TCP 14:58  ESTABLISHED 10.101.186.48:56312 10.101.186.48:443  10.244.235.137:443
        TCP 00:18  CLOSE_WAIT  10.101.186.48:56328 10.101.186.48:443  10.244.235.137:443

        还有一个规则一直在等待

        ipvsadm -lnc  | grep 10.101.186.48
        TCP 14:44  ESTABLISHED 10.101.186.48:56312 10.101.186.48:443  10.244.235.137:443
        • Jeff 回复了此帖

          guoh1988

          ipvsadm -Ln
          TCP  10.101.186.48:443 rr
            -> 10.244.180.39:443            Masq    1      0          0         
          -> 10.244.235.137:443 Masq 0 2 0 ``` 这个规则是由kube-proxy来控制的,你查下你的环境,是不是kube-proxy刷新的不够及时
          ipvsadm -Ln
          TCP  10.101.186.48:443 rr
            -> 10.244.180.39:443            Masq    1      0          0         
          
            -> 10.244.235.137:443           Masq    0      2          0 

          这边权重已经变成0了,应该不会轮训上去,刚才我试了下出问题时候kill kube-paiserver都能连接到10.244.180.39:443
          下面是我的kube-proxy的配置


            apiVersion: kubeproxy.config.k8s.io/v1alpha1
          bindAddress: 0.0.0.0
          clientConnection:
            acceptContentTypes: ""
            burst: 10
            contentType: application/vnd.kubernetes.protobuf
            kubeconfig: /var/lib/kube-proxy/kubeconfig.conf
            qps: 5
          clusterCIDR: 10.244.0.0/16
          configSyncPeriod: 15m0s
          conntrack:
            maxPerCore: 32768
            min: 131072
            tcpCloseWaitTimeout: 1h0m0s
            tcpEstablishedTimeout: 24h0m0s
          enableProfiling: false
          healthzBindAddress: 0.0.0.0:10256
          hostnameOverride: ""
          iptables:
            masqueradeAll: false
            masqueradeBit: 14
            minSyncPeriod: 0s
            syncPeriod: 30s
          ipvs:
            excludeCIDRs: null
            minSyncPeriod: 0s
            scheduler: ""
            strictARP: true
            syncPeriod: 30s
          kind: KubeProxyConfiguration
          metricsBindAddress: 127.0.0.1:10249
          mode: ipvs
          nodePortAddresses: null
          oomScoreAdj: -999
          portRange: ""
          udpIdleTimeout: 250ms
          winkernel:
            enableDSR: false
            networkName: ""
            sourceVip: ""

          问下你手边有k8s+kubeshere集群吗?这边这个很好模拟,查看metrics-server所在节点,直接将所在节点关闭电源,我这边使用的VMware

          • Jeff 回复了此帖

            guoh1988 直接关机是可能出现这个问题,类似的问题我们之前遇到过,一般因为节点宕机导致pod需要迁移,会有默认5分钟的间隔时间。针对metrics-server,你可以把这个时间调小点

              Jeff 已经调小了,我是调整到30S, 5min是丢失节点pod重建的时间,我这边的metrics-server pod已经重建了,并不是因为pod没有建立的原因,这个等待需要15min

              10 天 后

              guoh1988 这个是K8s的问题了,你可以到k8s issue里搜下有无相关的问题