guoh1988K零S
- 已编辑
### 1.k8s版本
kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.10", GitCommit:"f3add640dbcd4f3c33a7749f38baaac0b3fe810d", GitTreeState:"clean", BuildDate:"2020-05-20T14:00:52Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.10", GitCommit:"f3add640dbcd4f3c33a7749f38baaac0b3fe810d", GitTreeState:"clean", BuildDate:"2020-05-20T13:51:56Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
1个管理节点3个工作节点
kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready master 22d v1.16.10
master2 Ready <none> 22d v1.16.10
master3 Ready <none> 22d v1.16.10
worker1 Ready <none> 21d v1.16.10
2.安装了KubeSphere 2.1.1的所有组件
helm list
NAME REVISION UPDATED STATUS CHART APP VERSION NAMESPACE
elasticsearch-logging 1 Wed Nov 11 18:24:06 2020 DEPLOYED elasticsearch-1.22.1 6.7.0-0217 kubesphere-logging-system
elasticsearch-logging-curator 1 Wed Nov 11 18:24:09 2020 DEPLOYED elasticsearch-curator-1.3.3 5.5.4-0217 kubesphere-logging-system
istio 1 Wed Nov 11 18:33:32 2020 DEPLOYED istio-1.3.3 1.3.3 istio-system
istio-init 1 Wed Nov 11 18:32:48 2020 DEPLOYED istio-init-1.3.2 1.3.2 istio-system
jaeger-operator 1 Wed Nov 11 18:34:13 2020 DEPLOYED jaeger-operator-2.9.0 1.13.1 istio-system
ks-jenkins 4 Mon Nov 23 09:50:01 2020 DEPLOYED jenkins-0.19.0 2.121.3-0217 kubesphere-devops-system
ks-minio 1 Wed Nov 11 18:22:25 2020 DEPLOYED minio-2.5.16 RELEASE.2019-08-07T01-59-21Z kubesphere-system
ks-openldap 1 Wed Nov 11 18:17:00 2020 DEPLOYED openldap-ha-0.1.0 1.0 kubesphere-system
ks-openpitrix 1 Wed Nov 11 18:23:59 2020 DEPLOYED openpitrix-0.1.0 v0.4.8 openpitrix-system
logging-fluentbit-operator 1 Wed Nov 11 18:24:03 2020 DEPLOYED fluentbit-operator-0.1.0 0.1.0-0217 kubesphere-logging-system
metrics-server 1 Wed Nov 11 18:21:46 2020 DEPLOYED metrics-server-2.5.0 0.3.1-0217 kube-system
uc 1 Wed Nov 11 18:31:00 2020 DEPLOYED jenkins-update-center-0.8.0 2.1.1 kubesphere-devops-system
3.运行情况良好,有一天一个节点突然宕机,当时运行在上面的metrics-server,ks-apiserver发生迁移ks-apiserver一直无法启动
节点丢失
kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready master 22d v1.16.10
master2 Ready <none> 22d v1.16.10
master3 Ready <none> 22d v1.16.10
worker1 NotReady <none> 21d v1.16.10
ks-apiserver 无法启动
kubectl get pods -n kubesphere-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
etcd-5769d4997f-8vkxc 1/1 Running 0 3h16m 10.244.180.48 master2 <none> <none>
ks-account-78dd6486bf-s5w8x 1/1 Running 0 10d 10.244.137.69 master1 <none> <none>
ks-apigateway-764d86967d-7qrkh 1/1 Running 0 10d 10.244.137.86 master1 <none> <none>
ks-apiserver-6b75dfdf4-gjxpr 0/1 Error 1 10d 10.244.137.83 master1 <none> <none>
ks-console-7fd5b7d47-rkwmv 1/1 Running 0 10d 10.244.137.82 master1 <none> <none>
ks-controller-manager-5cd6ff58b7-5cg8p 1/1 Running 1 10d 10.244.137.68 master1 <none> <none>
ks-installer-7d9fb945c7-ld466 1/1 Running 1 20d 10.244.136.13 master3 <none> <none>
minio-845b7bd867-r762v 1/1 Running 1 20d 10.244.136.32 master3 <none> <none>
mysql-66df969d-b9dzk 1/1 Running 1 20d 10.244.136.11 master3 <none> <none>
openldap-0 1/1 Running 5 22d 10.244.137.78 master1 <none> <none>
redis-6fd6c6d6f9-jtxzs 1/1 Running 4 22d 10.244.137.75 master1 <none> <none>
### 4.查看ks-apiserver 的日志,确认是连接kube-apiserver的时候出错
kubectl logs -f ks-apiserver-6b75dfdf4-gjxpr -n kubesphere-system
W1203 10:26:44.173078 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I1203 10:26:44.174200 1 server.go:179] Start cache objects
Error: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Usage:
ks-apiserver [flags]
Flags:
--add-dir-header If true, adds the file directory to the header
--alsologtostderr log to standard error as well as files
--bind-address string server bind address (default "0.0.0.0")
--elasticsearch-host string ElasticSearch logging service host. KubeSphere is using elastic as log store, if this filed left blank, KubeSphere will use kubernetes builtin log API instead, and the following elastic search options will be ignored.
--elasticsearch-version string ElasticSearch major version, e.g. 5/6/7, if left blank, will detect automatically.Currently, minimum supported version is 5.x
-h, --help help for ks-apiserver
--index-prefix string Index name prefix. KubeSphere will retrieve logs against indices matching the prefix. (default "fluentbit")
--insecure-port int insecure port number (default 9090)
--istio-pilot-host string istio pilot discovery service url
--jaeger-query-host string jaeger query service url
--jenkins-host string Jenkins service host address. If left blank, means Jenkins is unnecessary.
--jenkins-max-connections int Maximum allowed connections to Jenkins. (default 100)
--jenkins-password string Password for access to Jenkins service, used pair with username.
--jenkins-username string Username for access to Jenkins service. Leave it blank if there isn't any.
--kubeconfig string Path for kubernetes kubeconfig file, if left blank, will use in cluster way.
--log-backtrace-at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log-dir string If non-empty, write log files in this directory
--log-file string If non-empty, use this log file
--log-file-max-size uint Defines the maximum size a log file can grow to. Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--logtostderr log to standard error instead of files (default true)
--master string Used to generate kubeconfig for downloading, if not specified, will use host in kubeconfig.
--mysql-host string MySQL service host address. If left blank, the following related mysql options will be ignored.
--mysql-max-connection-life-time duration Maximum connection life time allowed to connecto to mysql. (default 10s)
--mysql-max-idle-connections int Maximum idle connections allowed to connect to mysql. (default 100)
--mysql-max-open-connections int Maximum open connections allowed to connect to mysql. (default 100)
--mysql-password string Password for access to mysql, should be used pair with password.
--mysql-username string Username for access to mysql service.
--openpitrix-app-manager-endpoint string OpenPitrix app manager endpoint
--openpitrix-attachment-manager-endpoint string OpenPitrix attachment manager endpoint
--openpitrix-category-manager-endpoint string OpenPitrix category manager endpoint
--openpitrix-cluster-manager-endpoint string OpenPitrix cluster manager endpoint
--openpitrix-repo-indexer-endpoint string OpenPitrix repo indexer endpoint
--openpitrix-repo-manager-endpoint string OpenPitrix repo manager endpoint
--openpitrix-runtime-manager-endpoint string OpenPitrix runtime manager endpoint
--prometheus-endpoint string Prometheus service endpoint which stores KubeSphere monitoring data, if left blank, will use builtin metrics-server as data source.
--prometheus-secondary-endpoint string Prometheus secondary service endpoint, if left empty and endpoint is set, will use endpoint instead.
--s3-access-key-id string access key of s2i s3 (default "AKIAIOSFODNN7EXAMPLE")
--s3-bucket string bucket name of s2i s3 (default "s2i-binaries")
--s3-disable-SSL disable ssl (default true)
--s3-endpoint string Endpoint to access to s3 object storage service, if left blank, the following options will be ignored.
--s3-force-path-style force path style (default true)
--s3-region string Region of s3 that will access to, like us-east-1. (default "us-east-1")
--s3-secret-access-key string secret access key of s2i s3 (default "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
--s3-session-token string session token of s2i s3
--secure-port int secure port number
--servicemesh-prometheus-host string prometheus service for servicemesh
--skip-headers If true, avoid header prefixes in the log messages
--skip-log-headers If true, avoid headers when opening log files
--sonarqube-host string Sonarqube service address, if left empty, following sonarqube options will be ignored.
--sonarqube-token string Sonarqube service access token.
--stderrthreshold severity logs at or above this threshold go to stderr (default 2)
--tls-cert-file string tls cert file
--tls-private-key string tls private key
-v, --v Level number for the log level verbosity
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
2020/12/03 10:26:44 unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
5.查看kube-apiserver中的日志发现连接metrics.k8s.io/v1beta1 时候出错
kubectl logs -f kube-apiserver-master1 -n kube-system
.....
E1203 10:22:44.410647 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:22:49.410972 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:22:54.411326 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:22:59.411622 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:23:04.411972 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:23:09.412390 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:23:14.412799 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:23:19.413131 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:23:21.758943 1 timeout.go:132] net/http: abort Handler
E1203 10:23:24.413409 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:23:29.413710 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:23:34.414094 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:23:39.534172 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I1203 10:23:40.410119 1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io
W1203 10:23:40.410178 1 handler_proxy.go:99] no RequestInfo found in the context
E1203 10:23:40.410216 1 controller.go:114] loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to retrieve openAPI spec, http error: ResponseCode: 503, Body: service unavailable
, Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
I1203 10:23:40.410223 1 controller.go:127] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
E1203 10:24:04.246085 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:24:09.246520 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:24:34.246193 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E1203 10:24:39.246591 1 available_controller.go:416] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: Get https://10.101.186.48:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
6.查看metrics-server 服务运行情况,确认已经在其他节点上重建
kubectl get pods -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-bbdc58449-cgsbt 1/1 Running 5 22d 10.244.137.79 master1 <none> <none>
calico-node-2qk2b 1/1 NodeLost 5 21d 192.168.210.74 worker1 <none> <none>
calico-node-gjdrw 1/1 Running 5 22d 192.168.210.73 master3 <none> <none>
calico-node-jwbnz 1/1 Running 4 22d 192.168.210.72 master2 <none> <none>
calico-node-svmv7 1/1 Running 5 22d 192.168.210.71 master1 <none> <none>
coredns-85d448b787-8nks7 1/1 Running 5 22d 10.244.137.87 master1 <none> <none>
coredns-85d448b787-lpxtt 1/1 Running 5 22d 10.244.137.80 master1 <none> <none>
etcd-master1 1/1 Running 5 22d 192.168.210.71 master1 <none> <none>
kube-apiserver-master1 1/1 Running 2 7d8h 192.168.210.71 master1 <none> <none>
kube-controller-manager-master1 1/1 Running 9 22d 192.168.210.71 master1 <none> <none>
kube-proxy-7wkbr 1/1 Running 5 22d 192.168.210.71 master1 <none> <none>
kube-proxy-8d7dj 1/1 Running 4 22d 192.168.210.72 master2 <none> <none>
kube-proxy-nhdsn 1/1 NodeLost 4 21d 192.168.210.74 worker1 <none> <none>
kube-proxy-sfbjm 1/1 Running 4 22d 192.168.210.73 master3 <none> <none>
kube-scheduler-master1 1/1 Running 9 22d 192.168.210.71 master1 <none> <none>
metrics-server-8b7689b66-xm6mf 1/1 Running 0 36s 10.244.180.55 master2 <none> <none>
metrics-server-8b7689b66-z9hk9 1/1 Unknown 0 3m58s 10.244.235.186 worker1 <none> <none>
tiller-deploy-5fd994b8f-twpn2 1/1 Running 0 3h13m 10.244.180.23 master2 <none> <none>
7.查看api-resources情况
kubectl api-resources
NAME SHORTNAMES APIGROUP NAMESPACED KIND
bindings true Binding
componentstatuses cs false ComponentStatus
configmaps cm true ConfigMap
endpoints ep true Endpoints
events ev true Event
limitranges limits true LimitRange
namespaces ns false Namespace
nodes no false Node
persistentvolumeclaims pvc true PersistentVolumeClaim
persistentvolumes pv false PersistentVolume
pods po true Pod
podtemplates true PodTemplate
replicationcontrollers rc true ReplicationController
resourcequotas quota true ResourceQuota
secrets true Secret
serviceaccounts sa true ServiceAccount
services svc true Service
mutatingwebhookconfigurations admissionregistration.k8s.io false MutatingWebhookConfiguration
validatingwebhookconfigurations admissionregistration.k8s.io false ValidatingWebhookConfiguration
customresourcedefinitions crd,crds apiextensions.k8s.io false CustomResourceDefinition
apiservices apiregistration.k8s.io false APIService
applications app.k8s.io true Application
controllerrevisions apps true ControllerRevision
daemonsets ds apps true DaemonSet
deployments deploy apps true Deployment
replicasets rs apps true ReplicaSet
statefulsets sts apps true StatefulSet
meshpolicies authentication.istio.io false MeshPolicy
policies authentication.istio.io true Policy
tokenreviews authentication.k8s.io false TokenReview
localsubjectaccessreviews authorization.k8s.io true LocalSubjectAccessReview
selfsubjectaccessreviews authorization.k8s.io false SelfSubjectAccessReview
selfsubjectrulesreviews authorization.k8s.io false SelfSubjectRulesReview
subjectaccessreviews authorization.k8s.io false SubjectAccessReview
horizontalpodautoscalers hpa autoscaling true HorizontalPodAutoscaler
cronjobs cj batch true CronJob
jobs batch true Job
certificatesigningrequests csr certificates.k8s.io false CertificateSigningRequest
adapters config.istio.io true adapter
attributemanifests config.istio.io true attributemanifest
handlers config.istio.io true handler
httpapispecbindings config.istio.io true HTTPAPISpecBinding
httpapispecs config.istio.io true HTTPAPISpec
instances config.istio.io true instance
quotaspecbindings config.istio.io true QuotaSpecBinding
quotaspecs config.istio.io true QuotaSpec
rules config.istio.io true rule
templates config.istio.io true template
leases coordination.k8s.io true Lease
bgpconfigurations crd.projectcalico.org false BGPConfiguration
bgppeers crd.projectcalico.org false BGPPeer
blockaffinities crd.projectcalico.org false BlockAffinity
clusterinformations crd.projectcalico.org false ClusterInformation
felixconfigurations crd.projectcalico.org false FelixConfiguration
globalnetworkpolicies gnp crd.projectcalico.org false GlobalNetworkPolicy
globalnetworksets crd.projectcalico.org false GlobalNetworkSet
hostendpoints crd.projectcalico.org false HostEndpoint
ipamblocks crd.projectcalico.org false IPAMBlock
ipamconfigs crd.projectcalico.org false IPAMConfig
ipamhandles crd.projectcalico.org false IPAMHandle
ippools crd.projectcalico.org false IPPool
kubecontrollersconfigurations crd.projectcalico.org false KubeControllersConfiguration
networkpolicies crd.projectcalico.org true NetworkPolicy
networksets crd.projectcalico.org true NetworkSet
s2ibinaries devops.kubesphere.io true S2iBinary
s2ibuilders s2ib devops.kubesphere.io true S2iBuilder
s2ibuildertemplates s2ibt devops.kubesphere.io false S2iBuilderTemplate
s2iruns s2ir devops.kubesphere.io true S2iRun
events ev events.k8s.io true Event
ingresses ing extensions true Ingress
jaegers jaegertracing.io true Jaeger
fluentbits logging.kubesphere.io true FluentBit
alertmanagers monitoring.coreos.com true Alertmanager
podmonitors monitoring.coreos.com true PodMonitor
prometheuses monitoring.coreos.com true Prometheus
prometheusrules monitoring.coreos.com true PrometheusRule
servicemonitors monitoring.coreos.com true ServiceMonitor
destinationrules dr networking.istio.io true DestinationRule
envoyfilters networking.istio.io true EnvoyFilter
gateways gw networking.istio.io true Gateway
serviceentries se networking.istio.io true ServiceEntry
sidecars networking.istio.io true Sidecar
virtualservices vs networking.istio.io true VirtualService
ingresses ing networking.k8s.io true Ingress
networkpolicies netpol networking.k8s.io true NetworkPolicy
runtimeclasses node.k8s.io false RuntimeClass
poddisruptionbudgets pdb policy true PodDisruptionBudget
podsecuritypolicies psp policy false PodSecurityPolicy
clusterrolebindings rbac.authorization.k8s.io false ClusterRoleBinding
clusterroles rbac.authorization.k8s.io false ClusterRole
rolebindings rbac.authorization.k8s.io true RoleBinding
roles rbac.authorization.k8s.io true Role
authorizationpolicies rbac.istio.io true AuthorizationPolicy
clusterrbacconfigs rbac.istio.io false ClusterRbacConfig
rbacconfigs rbac.istio.io true RbacConfig
servicerolebindings rbac.istio.io true ServiceRoleBinding
serviceroles rbac.istio.io true ServiceRole
priorityclasses pc scheduling.k8s.io false PriorityClass
servicepolicies servicemesh.kubesphere.io true ServicePolicy
strategies servicemesh.kubesphere.io true Strategy
csidrivers storage.k8s.io false CSIDriver
csinodes storage.k8s.io false CSINode
storageclasses sc storage.k8s.io false StorageClass
volumeattachments storage.k8s.io false VolumeAttachment
workspaces tenant.kubesphere.io false Workspace
error: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
8.查看conntrack 连接情况发现kube-apiserver 在连接丢失的pod
查看metrics-server 的service
kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
etcd ClusterIP None <none> 2379/TCP 21d
kube-controller-manager-headless ClusterIP None <none> 10252/TCP 22d
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 22d
kube-scheduler-headless ClusterIP None <none> 10251/TCP 22d
kubelet ClusterIP None <none> 10250/TCP 22d
metrics-server ClusterIP **10.101.186.48** <none> 443/TCP 21d
tiller-deploy ClusterIP 10.103.19.75 <none> 44134/TCP 22d
查看metrics-server 运行情况及podIP
kubectl get pods -o wide -n kube-system
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-bbdc58449-cgsbt 1/1 Running 5 22d 10.244.137.79 master1 <none> <none>
calico-node-2qk2b 1/1 NodeLost 5 21d 192.168.210.74 worker1 <none> <none>
calico-node-gjdrw 1/1 Running 5 22d 192.168.210.73 master3 <none> <none>
calico-node-jwbnz 1/1 Running 4 22d 192.168.210.72 master2 <none> <none>
calico-node-svmv7 1/1 Running 5 22d 192.168.210.71 master1 <none> <none>
coredns-85d448b787-8nks7 1/1 Running 5 22d 10.244.137.87 master1 <none> <none>
coredns-85d448b787-lpxtt 1/1 Running 5 22d 10.244.137.80 master1 <none> <none>
etcd-master1 1/1 Running 5 22d 192.168.210.71 master1 <none> <none>
kube-apiserver-master1 1/1 Running 2 7d8h 192.168.210.71 master1 <none> <none>
kube-controller-manager-master1 1/1 Running 9 22d 192.168.210.71 master1 <none> <none>
kube-proxy-7wkbr 1/1 Running 5 22d 192.168.210.71 master1 <none> <none>
kube-proxy-8d7dj 1/1 Running 4 22d 192.168.210.72 master2 <none> <none>
kube-proxy-nhdsn 1/1 NodeLost 4 21d 192.168.210.74 worker1 <none> <none>
kube-proxy-sfbjm 1/1 Running 4 22d 192.168.210.73 master3 <none> <none>
kube-scheduler-master1 1/1 Running 9 22d 192.168.210.71 master1 <none> <none>
metrics-server-8b7689b66-xm6mf 1/1 Running 0 36s **10.244.180.55** master2 <none> <none>
metrics-server-8b7689b66-z9hk9 1/1 Unknown 0 3m58s **10.244.235.186** worker1 <none> <none>
tiller-deploy-5fd994b8f-twpn2 1/1 Running 0 3h13m 10.244.180.23 master2 <none> <none>
观察连接情况
conntrack -L | grep 10.101.186.48
tcp 6 278 ESTABLISHED src=10.101.186.48 dst=**10.101.186.48** sport=45842 dport=443 src=**10.244.235.186** dst=192.168.210.71 sport=443 dport=19158 [ASSURED] mark=0 use=1
tcp 6 298 ESTABLISHED src=10.101.186.48 dst=**10.101.186.48** sport=45820 dport=443 src=**10.244.235.186** dst=**192.168.210.71** sport=443 dport=15276 [ASSURED] mark=0 use=2
这个bug只有在节点突然宕机或者拔掉网线的情况下出现,要经过10min才能切换,或者重启kube-apiserver,正常关闭节点没有