kubesphere 2.1.1 kubernetes1.17.3 helm 2.16.3 网络组件使用flannel 文件系统使用nfs 一主一从
操作系统 VMware CentOS7 内核 3.10.0-1127
事情是这样:一开始刚安装好ks最小化时各组件都是正常的,后来想从另一台虚拟机里取些文件,就把从机挂起了一会。等弄好了恢复后,发现UI界面访问不了了,就手动重启了ks-console;这时能访问登录界面,但是登录时报后端错误,看ks-gateway和ks-console日志说POST响应超时,我就又把ks-account手动重启了,这时候能登录进去,基本组件也能工作。但发现自己部署的服务通过NodePort访问不了,进入容器里ping 集群内服务域名也不通,怀疑是dns问题,就把coredns手动重启,然后就出现了现在的问题,ks-gateway 一直启动不成功,麻烦大佬看下
目前所有pod状态

[root@k8s-node1-master ~]# kubectl get pod -A
NAMESPACE                      NAME                                       READY   STATUS             RESTARTS   AGE
default                        nfs-client-provisioner-56f7868bb5-8twjf    1/1     Running            7          15h
gulimall                       elasticsearch-yvonqq-0                     1/1     Running            1          14h
gulimall                       gulimall-gateway-4fcbzx-6897f4c7bf-hvlm8   1/1     Running            0          10h
gulimall                       gulimall-nginx-7qvwwh-744897c66-vdpnp      1/1     Running            0          9h
gulimall                       kibana-3he3gv-6d5868b57-xkhch              1/1     Running            1          14h
gulimall                       mysql-master-apqwn7-0                      1/1     Running            1          15h
gulimall                       mysql-slave-6jlyba-0                       1/1     Running            1          14h
gulimall                       nacos-2u2xp1-0                             1/1     Running            1          14h
gulimall                       rabbitmq-management-vktn8h-0               1/1     Running            1          14h
gulimall                       redis-v7pobl-0                             1/1     Running            1          14h
gulimall                       sentinel-xntz17-5775485dc8-82slm           1/1     Running            1          14h
gulimall                       test-nginx-13d8j1-7764b57d99-km6d9         1/1     Running            0          9h
gulimall                       zipkin-snwqj4-777b6fb55b-fdwcl             1/1     Running            1          14h
kube-system                    coredns-7f9c544f75-lfrsx                   1/1     Running            0          36m
kube-system                    coredns-7f9c544f75-wm8mn                   1/1     Running            0          36m
kube-system                    etcd-k8s-node1-master                      1/1     Running            0          36m
kube-system                    kube-apiserver-k8s-node1-master            1/1     Running            0          36m
kube-system                    kube-controller-manager-k8s-node1-master   1/1     Running            0          36m
kube-system                    kube-flannel-ds-amd64-mwhx5                1/1     Running            0          35m
kube-system                    kube-flannel-ds-amd64-pbwfx                1/1     Running            0          35m
kube-system                    kube-proxy-hs4sn                           1/1     Running            0          36m
kube-system                    kube-proxy-ttxdq                           1/1     Running            0          36m
kube-system                    kube-scheduler-k8s-node1-master            1/1     Running            0          36m
kube-system                    tiller-deploy-5fdc6844fb-hdl2q             1/1     Running            0          36m
kubesphere-controls-system     default-http-backend-5d464dd566-4lq2d      1/1     Running            0          10h
kubesphere-controls-system     kubectl-admin-6c664db975-8j27t             1/1     Running            1          15h
kubesphere-monitoring-system   kube-state-metrics-566cdbcb48-wq4rv        4/4     Running            4          15h
kubesphere-monitoring-system   node-exporter-27b74                        2/2     Running            0          15h
kubesphere-monitoring-system   node-exporter-zqqfw                        2/2     Running            2          15h
kubesphere-monitoring-system   prometheus-k8s-0                           3/3     Running            0          10h
kubesphere-monitoring-system   prometheus-k8s-system-0                    3/3     Running            0          10h
kubesphere-monitoring-system   prometheus-operator-6b97679cfd-hgsx6       1/1     Running            1          15h
kubesphere-system              ks-account-6c4df6c6cd-vlmrz                1/1     Running            0          10h
kubesphere-system              ks-apigateway-b5b997555-7cxgt              0/1     CrashLoopBackOff   11         35m
kubesphere-system              ks-apiserver-77974d87d-hjj42               1/1     Running            0          13h
kubesphere-system              ks-console-5f984b994b-gkklg                1/1     Running            0          10h
kubesphere-system              ks-controller-manager-65f796dcb8-z47kq     1/1     Running            1          13h
kubesphere-system              ks-installer-75b8d89dff-67cxm              1/1     Running            1          15h
kubesphere-system              openldap-0                                 1/1     Running            0          15h
kubesphere-system              redis-6fd6c6d6f9-xvlcb                     1/1     Running            0          15h

gateway日志

coredns日志

  • Jeff 回复了此帖

    重新安装,应该是安装的时候没有去掉master污点

    zcloudz123 192.168.159.2 这个是你本机上设置的nameserver吧,这个不能用,把它去掉,重启下coredns

      Jeff 我把主从机下/etc/reslov.conf下的配置全注释掉,然后重启coredns还是老样子啊

      重装ks解决不了,每次都要重新部署k8s集群才能好,正常后主从一关机重启就复现了

      额,好像搞定了,我是这样做的
      1.修改/etc/sysconfig/network-scripts/ifcfg-xxx(对应自己的网卡名) 去掉曾经自己配的dns地址(192.168.159.2)替换成公网Dns(8.8.8.8)
      2.systemctl restart NetworkManager 重启服务
      3.手动delete coredns、gateway重启pod

      (另:NetworkManager似乎和network有冲突,重启机器后网卡不启动,需要关掉NM再启动network再启动NM)