• Kubernetes
  • openLDAP服务如果数据损坏了 请教如何恢复服务

openldap-ha 其中一个节点的磁盘有点问题,然后修复时删除了一部分数据,导致openLDAP服务有个节点异常了

kubesphere-system              openldap-0                                                        0/1     CreateContainerError   11         130d
kubesphere-system              openldap-1                                                        1/1     Running                10         130d

现在报错:

  Warning  Failed          7m52s                  kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/cb7546231a100f02b414080eb065e88769876b79e72736d1e4ec941c69b9b3a4-init/merged: no such file or directory
  Warning  Failed          7m28s (x2 over 7m39s)  kubelet, smyk8s-h3c-master-01  (combined from similar events): Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/6272d607de946411c5ea81c1277345830ff49af7d94fc5ada83a683fe02e937a-init/merged: no such file or directory
  Normal   Pulled          90s (x39 over 9m16s)   kubelet, smyk8s-h3c-master-01  Container image "osixia/openldap:1.3.0" already present on machine

请教下这种如何解决呢

  • hongming 回复了此帖
  • 根据版主的指导总结下解决过程:

    1、master节点有3个,其中master1的数据被删无法启动,需要调度到master2和master3上启动,这里是对master2和master3打个自定义标签node-role.kubernetes.io/openldap:

    kubectl label node master-02 node-role.kubernetes.io/openldap
    kubectl label node master-02 node-role.kubernetes.io/openldap

    #实例关闭

    kubectl -n kubesphere-system scale sts openldap --replicas=0

    修改sts

    kubectl -n kubesphere-system edit sts openldap
    #在亲和性上修改为:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: node-role.kubernetes.io/openldap
            operator: In
            values:
            - ""
        weight: 100
    
    #保存退出,
    
    #启动实例等待openldap服务启动完成验证分配节点是否正常
    kubectl -n kubesphere-system scale sts openldap --replicas=2
    kubectl get po -n kubesphere-system -o wide |grep open
    
    #openldap启动完成后重启account服务
    kubectl rollout restart deployment ks-account -n kubesphere-system

    2、重置管理员密码

    #查看account服务的pod name
    kubectl get po -n kubesphere-system  |grep ks-account
    
    #进入容器
    kubectl exec -it ks-account-5d8c49d4bc-4rz29 -n kubesphere-system sh
    
    #执行初始化账号
    packet='PUT /kapis/iam.kubesphere.io/v1alpha2/users/admin HTTP/1.1\r\nHost: ks-account.kubesphere-system.svc:9090\r\nUser-Agent: curl/7.54.0\r\nAccept: */*\r\nContent-Type: application/json\r\nContent-Length: 105\r\n\r\n{"username": "admin","email":"admin@kubesphere.io","cluster_role": "cluster-admin","password":"P@88w0rd"}'; echo -ne $packet | nc ks-account.kubesphere-system.svc 80

    接下来管理台就可以进去了。

    @rysinal

    1. 副本数设为 0 kubectl -n kubesphere-system scale sts openldap --replicas=0
    2. 删除数据异常的pvc kubectl -n kubesphere-system kubectl -n kubesphere-system delete pvc <pvc-name>
    3. 副本数设为 2 kubectl -n kubesphere-system scale sts openldap --replicas=2

    openldap 为双主备份,等待数据同步就可以了。

      hongming 按照

      kubectl -n kubesphere-system scale sts openldap --replicas=0
      statefulset.apps/openldap scaled
      
      kubectl delete pvc openldap-pvc-openldap-0 -n kubesphere-system
      persistentvolumeclaim "openldap-pvc-openldap-0" deleted
      
      kubectl get pv,pvc -n kubesphere-system |grep openl
      persistentvolume/pvc-2d0cd408-7456-4805-a52f-a9b28dd21df3   2Gi        RWO            Delete           Bound    kubesphere-system/openldap-pvc-openldap-1    nfs-client              130d
      persistentvolumeclaim/openldap-pvc-openldap-1   Bound    pvc-2d0cd408-7456-4805-a52f-a9b28dd21df3   2Gi        RWO            nfs-client     130d
      
      kubectl -n kubesphere-system scale sts openldap --replicas=2
      statefulset.apps/openldap scaled
      
      kubectl get po -n kubesphere-system  |grep open
      openldap-0                               0/1     CreateContainerError   0          23s
      
      kubectl describe pod openldap-0 -n kubesphere-system
      Events:
        Type     Reason            Age               From                           Message
        ----     ------            ----              ----                           -------
        Warning  FailedScheduling  <unknown>         default-scheduler              pod has unbound immediate PersistentVolumeClaims (repeated 9 times)
        Warning  FailedScheduling  <unknown>         default-scheduler              pod has unbound immediate PersistentVolumeClaims (repeated 9 times)
        Normal   Scheduled         <unknown>         default-scheduler              Successfully assigned kubesphere-system/openldap-0 to smyk8s-h3c-master-01
        Warning  Failed            59s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/002040f68eab7d434351b105a64d1da5b75540517f1ecab0670ef83590cc228d-init/merged: no such file or directory
        Warning  Failed            59s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/1f5c226b92f1df94c7d4865d72b8024abf76aa1a788c5956fa601f1d8f9ab96e-init/merged: no such file or directory
        Warning  Failed            58s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/88e6e850df095b561f9793ac34d2cfbbcf8b6f42e9f290f0374ab7cb5691f119-init/merged: no such file or directory
        Warning  Failed            43s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/349ed8db555722e08d1a1d1b451ac15051f07c941382377a312e219ce34f4e6a-init/merged: no such file or directory
        Warning  Failed            31s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/c97b6aca5054759affe4682eaf5c45bbe819f35d2c93f7ce9d04f2573eaa6c9e-init/merged: no such file or directory
        Warning  Failed            20s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/16a24d8b742a2a202d26f59cd2c59daf22b9001e24fc8eb6eab841d3cb710869-init/merged: no such file or directory
        Normal   Pulled            8s (x7 over 60s)  kubelet, smyk8s-h3c-master-01  Container image "osixia/openldap:1.3.0" already present on machine
        Warning  Failed            8s                kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/27c3a1fc6b3cb264b9951d99c16bdb2a2b88d57f8c678005d9601e34e6d741a4-init/merged: no such file or directory

      操作后还是报同样的错:

      hongming 看着不像我这边的问题,请问有办法初始化这块的数据吗,目前管理端登录不上了,想先解决这个问题

      可以把pvc 都删了 重建, 排除一下误删文件导致不能使用的问题, openldap正常running 之后, 还需要重启一下ks-account

        hongming 是文件不存在了,应该是误删了文件,所以重建失败了。。可以远程帮忙看下吗

          rysinal ldap有两个副本,有两个vpc,openldap-pvc-openldap-0和openldap-pvc-openldap-1,都需要删除。
          副本数设为 0 kubectl -n kubesphere-system scale sts openldap –replicas=0
          删除数据异常的pvc kubectl -n kubesphere-system kubectl -n kubesphere-system delete pvc <pvc-name>
          副本数设为 2 kubectl -n kubesphere-system scale sts openldap –replicas=2
          最后重启ks-acount模块

            Forest-L 是的,删除后重建时启动第一个实例openldap-0就失败了

            这是集群环境的问题了, 不是应用的问题,https://github.com/docker/for-linux/issues/711, 看看是不是节点上 docker 的问题, 可以 把副本数调整为1 并且通过nodeSelector 将pod 调度到其他master 节点上看看 , 你可以在出现问题的这个节点上试试, docker run 能否正常 启动一个 container

              hongming
              调度其他节点后,启动正常了。

              之前的账号无法登录了,按照https://kubesphere.com.cn/forum/d/570-kubesphere-faq 的方法找回密码时,报错:

              / # packet='PUT /kapis/iam.kubesphere.io/v1alpha2/users/admin HTTP/1.1\r\nHost: ks-account.kubesphere-system.svc:9090\r\nUser-Agent: curl/7.54.0\r\nAccept: */*\r\nContent-Type: application
              /json\r\nContent-Length: 105\r\n\r\n{"username": "admin","email":"admin@kubesphere.io","cluster_role": "cluster-admin","password":"P@88w0rd"}'; echo -ne $packet | nc ks-account.kubesphere-
              system.svc 80
              HTTP/1.1 500 Internal Server Error
              Content-Type: application/json
              Date: Tue, 19 May 2020 09:44:17 GMT
              Content-Length: 58
              
              {
               "message": "LDAP Result Code 32 \"No Such Object\": "
              }/ #

              查看openLDAP的pod err日志有这么一段:

              5ec3aa71 conn=1037 fd=17 ACCEPT from IP=10.233.66.57:35316 (IP=0.0.0.0:389)
              5ec3aa71 conn=1037 op=0 BIND dn="cn=admin,dc=kubesphere,dc=io" method=128
              5ec3aa71 conn=1037 op=0 BIND dn="cn=admin,dc=kubesphere,dc=io" mech=SIMPLE ssf=0
              5ec3aa71 conn=1037 op=0 RESULT tag=97 err=0 text=
              5ec3aa71 conn=1037 op=1 SRCH base="ou=Users,dc=kubesphere,dc=io" scope=2 deref=0 filter="(&(objectClass=inetOrgPerson)(mail=admin@kubesphere.io))"
              5ec3aa71 conn=1037 op=1 SRCH attr=uid mail
              5ec3aa71 conn=1037 op=1 SEARCH RESULT tag=101 err=32 nentries=0 text=
              5ec3aa75 conn=1038 fd=18 ACCEPT from IP=10.233.72.1:39958 (IP=0.0.0.0:389)
              5ec3aa75 conn=1038 fd=18 closed (connection lost)

                根据版主的指导总结下解决过程:

                1、master节点有3个,其中master1的数据被删无法启动,需要调度到master2和master3上启动,这里是对master2和master3打个自定义标签node-role.kubernetes.io/openldap:

                kubectl label node master-02 node-role.kubernetes.io/openldap
                kubectl label node master-02 node-role.kubernetes.io/openldap

                #实例关闭

                kubectl -n kubesphere-system scale sts openldap --replicas=0

                修改sts

                kubectl -n kubesphere-system edit sts openldap
                #在亲和性上修改为:
                nodeAffinity:
                  preferredDuringSchedulingIgnoredDuringExecution:
                  - preference:
                      matchExpressions:
                      - key: node-role.kubernetes.io/openldap
                        operator: In
                        values:
                        - ""
                    weight: 100
                
                #保存退出,
                
                #启动实例等待openldap服务启动完成验证分配节点是否正常
                kubectl -n kubesphere-system scale sts openldap --replicas=2
                kubectl get po -n kubesphere-system -o wide |grep open
                
                #openldap启动完成后重启account服务
                kubectl rollout restart deployment ks-account -n kubesphere-system

                2、重置管理员密码

                #查看account服务的pod name
                kubectl get po -n kubesphere-system  |grep ks-account
                
                #进入容器
                kubectl exec -it ks-account-5d8c49d4bc-4rz29 -n kubesphere-system sh
                
                #执行初始化账号
                packet='PUT /kapis/iam.kubesphere.io/v1alpha2/users/admin HTTP/1.1\r\nHost: ks-account.kubesphere-system.svc:9090\r\nUser-Agent: curl/7.54.0\r\nAccept: */*\r\nContent-Type: application/json\r\nContent-Length: 105\r\n\r\n{"username": "admin","email":"admin@kubesphere.io","cluster_role": "cluster-admin","password":"P@88w0rd"}'; echo -ne $packet | nc ks-account.kubesphere-system.svc 80

                接下来管理台就可以进去了。