执行kubectl get pod -A查看pod状态,openebs不正常,报错如下
kube-system openebs-localpv-provisioner-84956ddb89-cbrsz 0/1 CrashLoopBackOff 163 15h
kube-system openebs-ndm-2pvrj 1/1 Running 0 17h
kube-system openebs-ndm-62glx 1/1 Running 0 17h
kube-system openebs-ndm-operator-6896cbf7b8-56d7j 0/1 CrashLoopBackOff 147 15h

一、openebs-ndm-operator报错如下

kubectl describe pod openebs-ndm-operator-6896cbf7b8-56d7j -n kube-system
Events:
Type Reason Age From Message


Normal Pulled 54m (x146 over 16h) kubelet, node1 Successfully pulled image “kubesphere/node-disk-operator:0.5.0”
Warning Unhealthy 14m (x399 over 16h) kubelet, node1 Readiness probe failed: stat: cannot stat ‘/tmp/operator-sdk-ready’: No such file or directory
Warning BackOff 4m7s (x3626 over 16h) kubelet, node1 Back-off restarting failed container

kubectl logs openebs-ndm-operator-6896cbf7b8-56d7j -n kube-system

{“level”:“info”,“ts”:1605059498.7081673,“logger”:“ndm-operator”,“msg”:“Go Version: go1.12.7”}
{“level”:“info”,“ts”:1605059498.7082253,“logger”:“ndm-operator”,“msg”:“Go OS/Arch: linux/amd64”}
{“level”:“info”,“ts”:1605059498.7083576,“logger”:“ndm-operator”,“msg”:“operator-sdk Version: v0.5.0”}
{“level”:“info”,“ts”:1605059498.7083642,“logger”:“ndm-operator”,“msg”:“Version Tag: v0.5.0”}
{“level”:“info”,“ts”:1605059498.708368,“logger”:“ndm-operator”,“msg”:“Git Commit: 63ca87a283e5feea87821c4f96c8b38c038343e3”}
{“level”:“info”,“ts”:1605059498.7087047,“logger”:“leader”,“msg”:“Trying to become the leader.”}
{“level”:“error”,“ts”:1605059558.7123368,“logger”:“kubebuilder.manager”,“msg”:“Failed to get API Group-Resources”,“error”:“Get https://10.171.0.1:443/api?timeout=32s: dial tcp 10.171.0.1:443: i/o timeout”,“stacktrace”:“github.com/openebs/node-disk-manager/vendor/github.com/go-logr/zapr.(zapLogger).Error\n\t/home/travis/gopath/src/github.com/openebs/node-disk-manager/vendor/github.com/go-logr/zapr/zapr.go:128\ngithub.com/openebs/node-disk-manager/vendor/sigs.k8s.io/controller-runtime/pkg/manager.New\n\t/home/travis/gopath/src/github.com/openebs/node-disk-manager/vendor/sigs.k8s.io/controller-runtime/pkg/manager/manager.go:173\nmain.main\n\t/home/travis/gopath/src/github.com/openebs/node-disk-manager/cmd/manager/main.go:98\nruntime.main\n\t/home/travis/.gimme/versions/go1.12.7.linux.amd64/src/runtime/proc.go:200”}
{“level”:“error”,“ts”:1605059558.7127206,“logger”:“ndm-operator”,“msg”:"",“error”:“Get https://10.171.0.1:443/api?timeout=32s: dial tcp 10.171.0.1:443: i/o timeout”,“stacktrace”:“github.com/openebs/node-disk-manager/vendor/github.com/go-logr/zapr.(
zapLogger).Error\n\t/home/travis/gopath/src/github.com/openebs/node-disk-manager/vendor/github.com/go-logr/zapr/zapr.go:128\nmain.main\n\t/home/travis/gopath/src/github.com/openebs/node-disk-manager/cmd/manager/main.go:100\nruntime.main\n\t/home/travis/.gimme/versions/go1.12.7.linux.amd64/src/runtime/proc.go:200”}

二、openebs-localpv-provisioner报错如下

kubectl describe pod openebs-localpv-provisioner-84956ddb89-cbrsz -n kube-system
Events:
Type Reason Age From Message


Warning Failed 21m (x10 over 15h) kubelet, node2 Error: ImagePullBackOff
Warning BackOff 74s (x3997 over 16h) kubelet, node2 Back-off restarting failed container


kubectl logs openebs-localpv-provisioner-84956ddb89-cbrsz -n kube-system
I1111 01:49:52.478537 1 start.go:65] Starting Provisioner…
Cannot start Provisioner: failed to get Kubernetes server version: Get https://10.171.0.1:443/version?timeout=32s: dial tcp 10.171.0.1:443: i/o timeout

    csz711 看起来像是网络有问题啊,

    Cannot start Provisioner: failed to get Kubernetes server version

    这个连k8s server version都查不到了。

    Get https://10.171.0.1:443/api?timeout=32s: dial tcp 10.171.0.1:443: i/o timeout

    这应该是去连kube-apiserver,io超时了



    看下你的cluster的网络是否有问题还有pvc的状态

    flannel网络问题。

    记录排查过程:
    由于我一直使用A主机安装集群,不同集群使用不同的config-sample.yaml文件,但是公用的一个kubekey文件夹。

    -rw-r--r-- 1 bglab bglab        2326 Oct 19 15:11 config-sample_231.yaml
    -rw-rw-r-- 1 bglab bglab        2324 Nov 10 15:26 config-sample_171.yaml
    -rw-rw-r-- 1 bglab bglab        2327 Nov 10 16:44 config-sample_235.yaml
    drwxrwxr-x 3 bglab bglab        4096 Nov 11 10:16 kubekey

    但是这次修改了yaml文件中的kubePodsCIDR和kubeServiceCIDR网段,导致openebs报错network: failed to set bridge addr: “cni0“ already has an IP address different from xxx,参照https://blog.csdn.net/ibless/article/details/107899009进行修改

    sudo ifconfig cni0 down    
    sudo ip link delete cni0

    修改后,就有了本文上述的报错

    2、删除A主机的kubekey文件夹,重新部署,问题解决

    可能是除了cni网卡还有其他地方存了这个网段地址?

    结论:yaml文件中除了主机信息,若修改 其他 信息,最好删除kubekey文件夹重新生成