• Kubernetes
  • openLDAP服务如果数据损坏了 请教如何恢复服务

@rysinal

  1. 副本数设为 0 kubectl -n kubesphere-system scale sts openldap --replicas=0
  2. 删除数据异常的pvc kubectl -n kubesphere-system kubectl -n kubesphere-system delete pvc <pvc-name>
  3. 副本数设为 2 kubectl -n kubesphere-system scale sts openldap --replicas=2

openldap 为双主备份,等待数据同步就可以了。

    hongming 按照

    kubectl -n kubesphere-system scale sts openldap --replicas=0
    statefulset.apps/openldap scaled
    
    kubectl delete pvc openldap-pvc-openldap-0 -n kubesphere-system
    persistentvolumeclaim "openldap-pvc-openldap-0" deleted
    
    kubectl get pv,pvc -n kubesphere-system |grep openl
    persistentvolume/pvc-2d0cd408-7456-4805-a52f-a9b28dd21df3   2Gi        RWO            Delete           Bound    kubesphere-system/openldap-pvc-openldap-1    nfs-client              130d
    persistentvolumeclaim/openldap-pvc-openldap-1   Bound    pvc-2d0cd408-7456-4805-a52f-a9b28dd21df3   2Gi        RWO            nfs-client     130d
    
    kubectl -n kubesphere-system scale sts openldap --replicas=2
    statefulset.apps/openldap scaled
    
    kubectl get po -n kubesphere-system  |grep open
    openldap-0                               0/1     CreateContainerError   0          23s
    
    kubectl describe pod openldap-0 -n kubesphere-system
    Events:
      Type     Reason            Age               From                           Message
      ----     ------            ----              ----                           -------
      Warning  FailedScheduling  <unknown>         default-scheduler              pod has unbound immediate PersistentVolumeClaims (repeated 9 times)
      Warning  FailedScheduling  <unknown>         default-scheduler              pod has unbound immediate PersistentVolumeClaims (repeated 9 times)
      Normal   Scheduled         <unknown>         default-scheduler              Successfully assigned kubesphere-system/openldap-0 to smyk8s-h3c-master-01
      Warning  Failed            59s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/002040f68eab7d434351b105a64d1da5b75540517f1ecab0670ef83590cc228d-init/merged: no such file or directory
      Warning  Failed            59s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/1f5c226b92f1df94c7d4865d72b8024abf76aa1a788c5956fa601f1d8f9ab96e-init/merged: no such file or directory
      Warning  Failed            58s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/88e6e850df095b561f9793ac34d2cfbbcf8b6f42e9f290f0374ab7cb5691f119-init/merged: no such file or directory
      Warning  Failed            43s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/349ed8db555722e08d1a1d1b451ac15051f07c941382377a312e219ce34f4e6a-init/merged: no such file or directory
      Warning  Failed            31s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/c97b6aca5054759affe4682eaf5c45bbe819f35d2c93f7ce9d04f2573eaa6c9e-init/merged: no such file or directory
      Warning  Failed            20s               kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/16a24d8b742a2a202d26f59cd2c59daf22b9001e24fc8eb6eab841d3cb710869-init/merged: no such file or directory
      Normal   Pulled            8s (x7 over 60s)  kubelet, smyk8s-h3c-master-01  Container image "osixia/openldap:1.3.0" already present on machine
      Warning  Failed            8s                kubelet, smyk8s-h3c-master-01  Error: Error response from daemon: error creating overlay mount to /data/docker/overlay2/27c3a1fc6b3cb264b9951d99c16bdb2a2b88d57f8c678005d9601e34e6d741a4-init/merged: no such file or directory

    操作后还是报同样的错:

    hongming 看着不像我这边的问题,请问有办法初始化这块的数据吗,目前管理端登录不上了,想先解决这个问题

    可以把pvc 都删了 重建, 排除一下误删文件导致不能使用的问题, openldap正常running 之后, 还需要重启一下ks-account

      hongming 是文件不存在了,应该是误删了文件,所以重建失败了。。可以远程帮忙看下吗

        rysinal ldap有两个副本,有两个vpc,openldap-pvc-openldap-0和openldap-pvc-openldap-1,都需要删除。
        副本数设为 0 kubectl -n kubesphere-system scale sts openldap –replicas=0
        删除数据异常的pvc kubectl -n kubesphere-system kubectl -n kubesphere-system delete pvc <pvc-name>
        副本数设为 2 kubectl -n kubesphere-system scale sts openldap –replicas=2
        最后重启ks-acount模块

          Forest-L 是的,删除后重建时启动第一个实例openldap-0就失败了

          这是集群环境的问题了, 不是应用的问题,https://github.com/docker/for-linux/issues/711, 看看是不是节点上 docker 的问题, 可以 把副本数调整为1 并且通过nodeSelector 将pod 调度到其他master 节点上看看 , 你可以在出现问题的这个节点上试试, docker run 能否正常 启动一个 container

            hongming
            调度其他节点后,启动正常了。

            之前的账号无法登录了,按照https://kubesphere.com.cn/forum/d/570-kubesphere-faq 的方法找回密码时,报错:

            / # packet='PUT /kapis/iam.kubesphere.io/v1alpha2/users/admin HTTP/1.1\r\nHost: ks-account.kubesphere-system.svc:9090\r\nUser-Agent: curl/7.54.0\r\nAccept: */*\r\nContent-Type: application
            /json\r\nContent-Length: 105\r\n\r\n{"username": "admin","email":"admin@kubesphere.io","cluster_role": "cluster-admin","password":"P@88w0rd"}'; echo -ne $packet | nc ks-account.kubesphere-
            system.svc 80
            HTTP/1.1 500 Internal Server Error
            Content-Type: application/json
            Date: Tue, 19 May 2020 09:44:17 GMT
            Content-Length: 58
            
            {
             "message": "LDAP Result Code 32 \"No Such Object\": "
            }/ #

            查看openLDAP的pod err日志有这么一段:

            5ec3aa71 conn=1037 fd=17 ACCEPT from IP=10.233.66.57:35316 (IP=0.0.0.0:389)
            5ec3aa71 conn=1037 op=0 BIND dn="cn=admin,dc=kubesphere,dc=io" method=128
            5ec3aa71 conn=1037 op=0 BIND dn="cn=admin,dc=kubesphere,dc=io" mech=SIMPLE ssf=0
            5ec3aa71 conn=1037 op=0 RESULT tag=97 err=0 text=
            5ec3aa71 conn=1037 op=1 SRCH base="ou=Users,dc=kubesphere,dc=io" scope=2 deref=0 filter="(&(objectClass=inetOrgPerson)(mail=admin@kubesphere.io))"
            5ec3aa71 conn=1037 op=1 SRCH attr=uid mail
            5ec3aa71 conn=1037 op=1 SEARCH RESULT tag=101 err=32 nentries=0 text=
            5ec3aa75 conn=1038 fd=18 ACCEPT from IP=10.233.72.1:39958 (IP=0.0.0.0:389)
            5ec3aa75 conn=1038 fd=18 closed (connection lost)

              根据版主的指导总结下解决过程:

              1、master节点有3个,其中master1的数据被删无法启动,需要调度到master2和master3上启动,这里是对master2和master3打个自定义标签node-role.kubernetes.io/openldap:

              kubectl label node master-02 node-role.kubernetes.io/openldap
              kubectl label node master-02 node-role.kubernetes.io/openldap

              #实例关闭

              kubectl -n kubesphere-system scale sts openldap --replicas=0

              修改sts

              kubectl -n kubesphere-system edit sts openldap
              #在亲和性上修改为:
              nodeAffinity:
                preferredDuringSchedulingIgnoredDuringExecution:
                - preference:
                    matchExpressions:
                    - key: node-role.kubernetes.io/openldap
                      operator: In
                      values:
                      - ""
                  weight: 100
              
              #保存退出,
              
              #启动实例等待openldap服务启动完成验证分配节点是否正常
              kubectl -n kubesphere-system scale sts openldap --replicas=2
              kubectl get po -n kubesphere-system -o wide |grep open
              
              #openldap启动完成后重启account服务
              kubectl rollout restart deployment ks-account -n kubesphere-system

              2、重置管理员密码

              #查看account服务的pod name
              kubectl get po -n kubesphere-system  |grep ks-account
              
              #进入容器
              kubectl exec -it ks-account-5d8c49d4bc-4rz29 -n kubesphere-system sh
              
              #执行初始化账号
              packet='PUT /kapis/iam.kubesphere.io/v1alpha2/users/admin HTTP/1.1\r\nHost: ks-account.kubesphere-system.svc:9090\r\nUser-Agent: curl/7.54.0\r\nAccept: */*\r\nContent-Type: application/json\r\nContent-Length: 105\r\n\r\n{"username": "admin","email":"admin@kubesphere.io","cluster_role": "cluster-admin","password":"P@88w0rd"}'; echo -ne $packet | nc ks-account.kubesphere-system.svc 80

              接下来管理台就可以进去了。