高可用多节点安装。

openebs、nfs-client pod 启动失败。
2 个 pod 均报错:
provisioner.go:247] Error getting server version: Get "https://10.233.0.1:443/version?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout

但在任意节点上均可获取到 version 信息:
curl -k https://10.233.0.1:443/version?timeout=32s
{
"major": "1",
"minor": "20",
"gitVersion": "v1.20.6",
"gitCommit": "8a62859e515889f07e3e3be6a1080413f17cf2c3",
"gitTreeState": "clean",
"buildDate": "2021-04-15T03:19:55Z",
"goVersion": "go1.15.10",
"compiler": "gc",
"platform": "linux/amd64"
}

详细信息:

7 个 VirtualBox 虚拟机
虚机配置:4C/8G
操作系统:Debian 10.10
虚机网络:

  1. 桥接网络:172.31.31.0/24
  2. 虚机内部网络:192.168.168.0/24

172.31.31.70/192.168.168.71:负载均衡(Nginx TCP proxy)、NFS Server
172.31.31.71/192.168.168.72:Master 1
172.31.31.72/192.168.168.73:Master 2
172.31.31.73/192.168.168.74:Master 3
172.31.31.74/192.168.168.75:Node 1
172.31.31.75/192.168.168.76:Node 2
172.31.31.76/192.168.168.70:Node 3

安装方式:
高可用多节点安装
在线安装(KKZONE=cn)
kubekey 全套安装 K8S

kubekey 版本:v1.1.1
kubesphere 版本:v3.1.1
K8S 版本:v1.20.6

config-sample.yaml 中启用了 nfs-client addon,但并未设置 nfs-client 为默认 sc (defaultClass: false)
所以安装过程中还会自动安装 openebs。

经测试 nfs-client 设置为 defaultClass: true(不安装 openebs)或者,不启用 nfs-client,只安装默认的 openebs 都会引起上述报错。

以下是主要配置文件:

nfs-client.yaml

nfs:
  server: "192.168.168.70"  # This is the server IP address. Replace it with your own.
  path: "/data/nfs"  # Replace the exported directory with your own.
storageClass:
  defaultClass: false

config-sample.yaml

apiVersion: kubekey.kubesphere.io/v1alpha1
kind: Cluster
metadata:
  name: sample
spec:
  hosts:
  - {name: master1, address: 172.31.31.71, internalAddress: 192.168.168.71, user: root, password: xxx}
  - {name: master2, address: 172.31.31.72, internalAddress: 192.168.168.72, user: root, password: xxx}
  - {name: master3, address: 172.31.31.73, internalAddress: 192.168.168.73, user: root, password: xxx}
  - {name: node1, address: 172.31.31.74, internalAddress: 192.168.168.74, user: root, password: xxx}
  - {name: node2, address: 172.31.31.75, internalAddress: 192.168.168.75, user: root, password: xxx}
  - {name: node3, address: 172.31.31.76, internalAddress: 192.168.168.76, user: root, password: xxx}
  roleGroups:
    etcd:
    - master1
    - master2
    - master3
    master: 
    - master1
    - master2
    - master3
    worker:
    - node1
    - node2
    - node3
  controlPlaneEndpoint:
    domain: lb.kubesphere.local
    address: "172.31.31.70"
    port: 6443
  kubernetes:
    version: v1.20.6
    imageRepo: kubesphere
    clusterName: cluster.local
  network:
    plugin: calico
    kubePodsCIDR: 10.233.64.0/18
    kubeServiceCIDR: 10.233.0.0/18
  registry:
    registryMirrors: []
    insecureRegistries: []
  addons:
  - name: nfs-client
    namespace: kube-system
    sources:
      chart:
        name: nfs-client-provisioner
        repo: https://charts.kubesphere.io/main
        valuesFile: /usr/local/kubekey/nfs-client.yaml

kubekey 安装过程报错信息:
[master1 172.31.31.71] MSG:
namespace/kubesphere-system unchanged
serviceaccount/ks-installer unchanged
customresourcedefinition.apiextensions.k8s.io/clusterconfigurations.installer.kubesphere.io unchanged
clusterrole.rbac.authorization.k8s.io/ks-installer unchanged
clusterrolebinding.rbac.authorization.k8s.io/ks-installer unchanged
deployment.apps/ks-installer unchanged
clusterconfiguration.installer.kubesphere.io/ks-installer created
WARN[16:14:05 CST] Task failed ...
WARN[16:14:05 CST] error: KubeSphere startup timeout.
Error: Failed to deploy kubesphere: KubeSphere startup timeout.

kubectl get pod –all-namespaces
以下 2 个 pod 一直 Error
nfs-client-nfs-client-provisioner-794ffc57-qm6kc
openebs-localpv-provisioner-5cddd6cbfc-lbctr
报错信息:
1 provisioner.go:247] Error getting server version: Get "https://10.233.0.1:443/version?timeout=32s": dial tcp 10.233.0.1:443: i/o timeout

问题解决了。

  • 这个锅应该不是 kubesphere 的
  • 根本原因未知

解决方法
在出现该问题后,直接重启所有虚拟机。

重启后,等待所有 pod 启动,发现 nfs-client 和 openebs 已经正常启动了,不再报错。之后再次执行 kk cluster create,后续所需 pod 在正常创建。

问题原因
未知…… 猜测可能是 VirtualBox 的虚拟网络存在某种缓存或者 BUG?导致 CNI 创建后容器网络不通。重启虚机后,缓存/BUG 清除,容器网络恢复正常。毕竟没有能力深入查找原因…… 只能是推测。

1 年 后

@leonanu 我所遇到的问题和你的一样,我的重启虚拟机,还是没有解决,还有其他的方案没,请指

    13 天 后

    24sama nfs的情况目前是两个客户端节点,一个server节点,目前从任何一个节点创建文件或修改文件,都可以完成同步,nfs功能是好的,网络层面就不太会排查了,麻烦提供一个排查的思路或者方法,感谢!这个问题已经困扰好几天了。