ks3.0安装3master方式,第一遍安装后 openldap无法启动:启动后端口超时(此次etcd集群状态是正常的),解决无果后
,于是开始重置安装:
使用kk delete -f xx.yaml 重置集群后,第二次安装后却无法在master2和master3上启动etcd,

WARN[13:49:30 CST] Task failed ...
WARN[13:49:30 CST] error: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c "export ETCDCTL_API=2;export ETCDCTL_CERT_FILE='/etc/ssl/etcd/ssl/admin-master2.pem';export ETCDCTL_KEY_FILE='/etc/ssl/etcd/ssl/admin-master2-key.pem';export ETCDCTL_CA_FILE='/etc/ssl/etcd/ssl/ca.pem';/usr/local/bin/etcdctl --endpoints=https://172.22.215.70:2379,https://172.22.215.71:2379,https://172.22.215.72:2379 cluster-health | grep -q 'cluster is healthy'"
Error:  tls: failed to find any PEM data in certificate input: Process exited with status 1
Error: Failed to start etcd cluster: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c "export ETCDCTL_API=2;export ETCDCTL_CERT_FILE='/etc/ssl/etcd/ssl/admin-master2.pem';export ETCDCTL_KEY_FILE='/etc/ssl/etcd/ssl/admin-master2-key.pem';export ETCDCTL_CA_FILE='/etc/ssl/etcd/ssl/ca.pem';/usr/local/bin/etcdctl --endpoints=https://172.22.215.70:2379,https://172.22.215.71:2379,https://172.22.215.72:2379 cluster-health | grep -q 'cluster is healthy'"
Error:  tls: failed to find any PEM data in certificate input: Process exited with status 1
Usage:
  kk create cluster [flags]

Flags:
  -f, --filename string          Path to a configuration file
  -h, --help                     help for cluster
      --skip-pull-images         Skip pre pull images
      --with-kubernetes string   Specify a supported version of kubernetes
      --with-kubesphere          Deploy a specific version of kubesphere (default v3.0.0)
  -y, --yes                      Skip pre-check of the installation

Global Flags:
      --debug   Print detailed information (default true)

Failed to start etcd cluster: Failed to start etcd cluster: Failed to exec command: sudo -E /bin/sh -c "export ETCDCTL_API=2;export ETCDCTL_CERT_FILE='/etc/ssl/etcd/ssl/admin-master2.pem';export ETCDCTL_KEY_FILE='/etc/ssl/etcd/ssl/admin-master2-key.pem';export ETCDCTL_CA_FILE='/etc/ssl/etcd/ssl/ca.pem';/usr/local/bin/etcdctl --endpoints=https://172.22.215.70:2379,https://172.22.215.71:2379,https://172.22.215.72:2379 cluster-health | grep -q 'cluster is healthy'"
Error:  tls: failed to find any PEM data in certificate input: Process exited with status 1

查看master2系统日志报错:

Sep 10 11:00:12 0002 systemd: Started etcd docker wrapper.
Sep 10 11:00:12 0002 dockerd: time="2020-09-10T11:00:12.715170562+08:00" level=warning msg="Your kernel does not support Block I/O weight or the cgroup is not mounted. Weight discarded."
Sep 10 11:00:12 0002 etcd: WARNING: Your kernel does not support Block I/O weight or the cgroup is not mounted. Weight discarded.
Sep 10 11:00:12 0002 containerd: time="2020-09-10T11:00:12.753300735+08:00" level=info msg="shim containerd-shim started" address="/containerd-shim/moby/160cccd59de17ca7f77b73550f27d0bda3416cce051b51b89176659ccf10da54/shim.sock" debug=false pid=2837
Sep 10 11:00:12 0002 systemd: Started libcontainer container 160cccd59de17ca7f77b73550f27d0bda3416cce051b51b89176659ccf10da54.
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850325 I | pkg/flags: recognized and used environment variable ETCD_ADVERTISE_CLIENT_URLS=https://172.22.215.71:2379
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850389 I | pkg/flags: recognized and used environment variable ETCD_AUTO_COMPACTION_RETENTION=8
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850396 I | pkg/flags: recognized and used environment variable ETCD_CERT_FILE=/etc/ssl/etcd/ssl/member-master2.pem
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850403 I | pkg/flags: recognized and used environment variable ETCD_CLIENT_CERT_AUTH=true
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850412 I | pkg/flags: recognized and used environment variable ETCD_DATA_DIR=/var/lib/etcd
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850420 I | pkg/flags: recognized and used environment variable ETCD_ELECTION_TIMEOUT=5000
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850435 I | pkg/flags: recognized and used environment variable ETCD_HEARTBEAT_INTERVAL=250
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850441 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_ADVERTISE_PEER_URLS=https://172.22.215.71:2380
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850461 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER=etcd1=https://172.22.215.70:2380,etcd2=https://172.22.215.71:2380,etcd3=https://172.22.215.72:2380
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850468 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_STATE=new
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850472 I | pkg/flags: recognized and used environment variable ETCD_INITIAL_CLUSTER_TOKEN=k8s_etcd
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850477 I | pkg/flags: recognized and used environment variable ETCD_KEY_FILE=/etc/ssl/etcd/ssl/member-master2-key.pem
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850492 I | pkg/flags: recognized and used environment variable ETCD_LISTEN_CLIENT_URLS=https://172.22.215.71:2379,https://127.0.0.1:2379
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850514 I | pkg/flags: recognized and used environment variable ETCD_LISTEN_PEER_URLS=https://172.22.215.71:2380
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850524 I | pkg/flags: recognized and used environment variable ETCD_METRICS=basic
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850531 I | pkg/flags: recognized and used environment variable ETCD_NAME=etcd2
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850547 I | pkg/flags: recognized and used environment variable ETCD_PEER_CERT_FILE=/etc/ssl/etcd/ssl/member-master2.pem
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850552 I | pkg/flags: recognized and used environment variable ETCD_PEER_CLIENT_CERT_AUTH=True
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850561 I | pkg/flags: recognized and used environment variable ETCD_PEER_KEY_FILE=/etc/ssl/etcd/ssl/member-master2-key.pem
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850565 I | pkg/flags: recognized and used environment variable ETCD_PEER_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850569 I | pkg/flags: recognized and used environment variable ETCD_PROXY=off
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850578 I | pkg/flags: recognized and used environment variable ETCD_SNAPSHOT_COUNT=10000
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850584 I | pkg/flags: recognized and used environment variable ETCD_TRUSTED_CA_FILE=/etc/ssl/etcd/ssl/ca.pem
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850625 I | etcdmain: etcd Version: 3.3.12
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850632 I | etcdmain: Git SHA: d57e8b8
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850636 I | etcdmain: Go Version: go1.10.8
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850639 I | etcdmain: Go OS/Arch: linux/amd64
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850644 I | etcdmain: setting maximum number of CPUs to 8, total number of available CPUs is 8
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850695 I | embed: peerTLS: cert = /etc/ssl/etcd/ssl/member-master2.pem, key = /etc/ssl/etcd/ssl/member-master2-key.pem, ca = , trusted-ca = /etc/ssl/etcd/ssl/ca.pem, client-cert-auth = true, crl-file =
Sep 10 11:00:12 0002 etcd: 2020-09-10 03:00:12.850780 C | etcdmain: tls: failed to find any PEM data in certificate input
Sep 10 11:00:12 0002 containerd: time="2020-09-10T11:00:12.918680707+08:00" level=info msg="shim reaped" id=160cccd59de17ca7f77b73550f27d0bda3416cce051b51b89176659ccf10da54
Sep 10 11:00:12 0002 dockerd: time="2020-09-10T11:00:12.928728144+08:00" level=info msg="ignoring event" module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Sep 10 11:00:12 0002 systemd: etcd.service: main process exited, code=exited, status=1/FAILURE
Sep 10 11:00:13 0002 docker: etcd2
Sep 10 11:00:13 0002 systemd: Unit etcd.service entered failed state.
Sep 10 11:00:13 0002 systemd: etcd.service failed.

错误:tls: failed to find any PEM data in certificate input

错误2:
kubelet找不到(可能跟etcd的状态没启动有关)?
Failed at step EXEC spawning /usr/local/bin/kubelet: No such file or directory
-- Subject: Process /usr/local/bin/kubelet could not be executed

[root@master2 ~]# ll /usr/local/bin/
total 54884
-rwxr-xr-x 1 kube root      347 Sep 10 13:47 etcd
-rwxr-xr-x 1 root root 15817472 Sep 10 13:47 etcdctl
-rwxr-xr-x 1 kube root 40378368 Sep  8 15:44 helm
drwxr-xr-x 2 kube root       23 Sep  8 15:40 kube-scripts
#journalctl -xe
Sep 10 13:57:11 master2 systemd[1]: etcd.service failed.
Sep 10 13:57:16 master2 systemd[1]: kubelet.service holdoff time over, scheduling restart.
Sep 10 13:57:16 master2 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished shutting down
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished shutting down.
Sep 10 13:57:16 master2 systemd[1]: Started kubelet: The Kubernetes Node Agent.
-- Subject: Unit kubelet.service has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit kubelet.service has finished starting up.
--
-- The start-up result is done.
Sep 10 13:57:16 master2 systemd[20376]: Failed at step EXEC spawning /usr/local/bin/kubelet: No such file or directory
-- Subject: Process /usr/local/bin/kubelet could not be executed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- The process /usr/local/bin/kubelet could not be executed and failed.
--
-- The error number returned by this process is 2.
Sep 10 13:57:16 master2 systemd[1]: kubelet.service: main process exited, code=exited, status=203/EXEC
Sep 10 13:57:16 master2 systemd[1]: Unit kubelet.service entered failed state.
Sep 10 13:57:16 master2 systemd[1]: kubelet.service failed.

请教大佬 如何调整呢

  • .bashrc 里不能有无法执行的命令, 会导致安装失败。注意nfs server 的配置

    Recommended nfs server configuration: *(rw,insecure,sync,no_subtree_check,no_root_squash)

.bashrc 里不能有无法执行的命令, 会导致安装失败。注意nfs server 的配置

Recommended nfs server configuration: *(rw,insecure,sync,no_subtree_check,no_root_squash)

    感谢指导,正如hongming 所说 ,原因是因为第一次安装后,又安装了k8s命令补全工具。然后使用kubekey卸载后,kubectl被删除了,但补全工具还在环境变量里,导致第二次安装报错。注释掉补全工具的环境配置 再次安装即正常

    hongming 我们今天确认了下nas的属性:

    确定是no_root_squash模式的,请教下还有其他可能导致安装失败吗?

      rysinal 从现象判断大概率就是存储的问题,可以把mysql/ldap的crash log再贴一下, 看看有没有什么解决方式

        hongming

        [root@master1 ~]# kubectl -n kubesphere-system get po
        NAME                                    READY   STATUS             RESTARTS   AGE
        etcd-65796969c7-hjxnd                   1/1     Running            0          18h
        ks-apiserver-98484f67f-54q2j            1/1     Running            0          18h
        ks-apiserver-98484f67f-bhlnt            1/1     Running            0          18h
        ks-apiserver-98484f67f-t6b4m            1/1     Running            0          18h
        ks-console-786b9846d4-2hz26             1/1     Running            0          18h
        ks-console-786b9846d4-brpb4             1/1     Running            0          18h
        ks-console-786b9846d4-tqvc5             1/1     Running            0          18h
        ks-controller-manager-646595cb4-bss4w   1/1     Running            0          18h
        ks-controller-manager-646595cb4-dqn47   1/1     Running            0          18h
        ks-controller-manager-646595cb4-mpm6t   1/1     Running            0          18h
        ks-installer-7cb866bd-9fpzc             1/1     Running            0          18h
        minio-7bfdb5968b-pwbh4                  1/1     Running            0          18h
        mysql-7f64d9f584-4j82n                  0/1     CrashLoopBackOff   224        18h
        openldap-0                              1/1     Running            1          18h
        openldap-1                              0/1     CrashLoopBackOff   267        18h
        redis-ha-haproxy-5c6559d588-97kvx       1/1     Running            1          18h
        redis-ha-haproxy-5c6559d588-bf5wt       1/1     Running            0          18h
        redis-ha-haproxy-5c6559d588-jzxbq       1/1     Running            0          18h
        redis-ha-server-0                       2/2     Running            0          18h
        redis-ha-server-1                       2/2     Running            0          18h
        redis-ha-server-2                       2/2     Running            0          18h

        #mysql的日志:

        Initializing database
        2020-09-11T03:47:27.502006Z 0 [Warning] [MY-011070] [Server] 'Disabling symbolic links using --skip-symbolic-links (or equivalent) is the default. Consider not using thisoption as it' is deprecated and will be removed in a future release.
        2020-09-11T03:47:27.502110Z 0 [Warning] [MY-011068] [Server] The syntax 'expire-logs-days' is deprecated and will be removed in a future release. Please use binlog_expire_logs_seconds instead.
        2020-09-11T03:47:27.502235Z 0 [System] [MY-013169] [Server] /usr/sbin/mysqld (mysqld 8.0.11) initializing of server in progress as process 36
        2020-09-11T03:47:27.507361Z 0 [ERROR] [MY-010457] [Server] --initialize specified but the data directory has files in it. Aborting.
        2020-09-11T03:47:27.507444Z 0 [ERROR] [MY-010119] [Server] Aborting
        2020-09-11T03:47:27.507985Z 0 [System] [MY-010910] [Server] /usr/sbin/mysqld: Shutdown complete (mysqld 8.0.11)  MySQL Community Server - GPL.

        #openldap-0的日志

        5f5af49e <= mdb_equality_candidates: (uniqueMember) not indexed
        5f5af49e <= mdb_equality_candidates: (memberUid) not indexed
        5f5af49e conn=1082 op=676 SEARCH RESULT tag=101 err=0 nentries=0 text=
        5f5af49e conn=1082 op=677 SRCH base="ou=Users,dc=kubesphere,dc=io" scope=2 deref=3 filter="(&(objectClass=inetOrgPerson)(|(uid=admin)(mail=admin)))"
        5f5af49e conn=1082 op=677 SEARCH RESULT tag=101 err=0 nentries=1 text=
        5f5af49e conn=1082 op=678 SRCH base="ou=Groups,dc=kubesphere,dc=io" scope=2 deref=3 filter="(|(member=uid=admin,ou=users,dc=kubesphere,dc=io)(uniqueMember=uid=admin,ou=users,dc=kubesphere,dc=io)(memberUid=admin))"
        5f5af49e conn=1082 op=678 SRCH attr=cn
        5f5af49e <= mdb_equality_candidates: (member) not indexed
        5f5af49e <= mdb_equality_candidates: (uniqueMember) not indexed
        5f5af49e <= mdb_equality_candidates: (memberUid) not indexed
        5f5af49e conn=1082 op=678 SEARCH RESULT tag=101 err=0 nentries=0 text=
        5f5af4aa conn=10015 fd=13 ACCEPT from IP=172.22.215.71:25500 (IP=0.0.0.0:389)
        5f5af4aa conn=10015 fd=13 closed (connection lost)
        5f5af4ac conn=10016 fd=13 ACCEPT from IP=172.22.215.71:25568 (IP=0.0.0.0:389)
        5f5af4ac conn=10016 fd=13 closed (connection lost)
        5f5af4b2 slap_client_connect: URI=ldap://openldap-1.openldap DN="cn=admin,dc=kubesphere,dc=io" ldap_sasl_bind_s failed (-1)
        5f5af4b2 slap_client_connect: URI=ldap://openldap-1.openldap DN="cn=admin,cn=config" ldap_sasl_bind_s failed (-1)
        5f5af4b2 do_syncrepl: rid=102 rc -1 retrying
        5f5af4b2 do_syncrepl: rid=002 rc -1 retrying
        5f5af4b9 conn=10017 fd=13 ACCEPT from IP=172.22.215.71:25886 (IP=0.0.0.0:389)
        5f5af4b9 conn=10017 fd=13 closed (connection lost)
        5f5af4bb conn=10018 fd=13 ACCEPT from IP=172.22.215.71:25968 (IP=0.0.0.0:389)
        5f5af4bb conn=10018 fd=13 closed (connection lost)
        5f5af4c8 conn=10019 fd=13 ACCEPT from IP=172.22.215.71:26294 (IP=0.0.0.0:389)
        5f5af4c8 conn=10019 fd=13 closed (connection lost)

        #openldap-1的日志

        *** An error occurred. Aborting.
        *** Init system aborted.
        *** Not all processes have exited in time. Forcing them to exit.

        hongming 更新:
        1、mysql的删除了pvc里的ibdata1文件后恢复正常
        2、openldap重启后出现新的错误:

        5f5b4726 @(#) $OpenLDAP: slapd 2.4.50+dfsg-1~bpo10+1 (May 4 2020 05:25:06) $
        
        Debian OpenLDAP Maintainers
        
        5f5b4726 slapd starting
        
        5f5b4726 <= mdb_equality_candidates: (entryUUID) not indexed
        
        5f5b4728 syncrepl_message_to_entry: rid=001 mods check (objectClass: value #1 invalid per syntax)
        
        5f5b4728 do_syncrepl: rid=001 rc 21 retrying
        
        5f5b4728 <= mdb_equality_candidates: (entryUUID) not indexed
        
        5f5b4728 <= mdb_equality_candidates: (entryUUID) not indexed
        
        5f5b4728 <= mdb_equality_candidates: (entryUUID) not indexed
        
        5f5b4728 <= mdb_equality_candidates: (entryUUID) not indexed

          hongming openldap自己正常了。。。不过我看mysql又出现重启了,我再删除文件试下

          7 天 后

          hongming 请教下,我想设置nfs的选项,一直报格式错误,这个格式有示例吗

            addons:
            - name: nfs-client
              namespace: kube-system
              sources:
                chart:
                  name: nfs-client-provisioner
                  repo: https://charts.kubesphere.io/main
                  values:
                  - storageClass.defaultClass=true
                  - nfs.server=xxxxxx
                  - nfs.path=/xxxx
                  - nfs.mountOptions="nfsvers=3,timeo=600,nolock"

          - nfs.mountOptions="nfsvers=3,timeo=600,nolock主要是这行nolock设置
          安装报错:

          Failed to deploy addons: failed parsing --set data: key "nolock\"" has no value

            rysinal

            看了下nfs-client-provisioner这个chart,nfs.mountOptions这个参数是个列表,这个可以用values.yaml配置文件的方式配置。

            在自己机器上搞个custom-nfs-client-values.yaml, 修改成自己的配置后,把这个文件路径填到values那里。

            nfs:
              server: xxx
              path: xxx
              mountOptions:
              - nfsvers=3
              - timeo=600
              - nolock
            storageClass:
              defaultClass: true
              addons:
              - name: nfs-client
                namespace: kube-system
                sources:
                  chart:
                    name: nfs-client-provisioner
                    repo: https://charts.kubesphere.io/main
                    values: /xxx/custom-nfs-client-values.yaml

              Cauchy 感谢,已经成功,第二个mysql、openldap启动重启问题确实是nolock这个参数导致的

              10 个月 后

              Cauchy 大佬,我现在使用外置文件的方式不行了,是不让这样写了吗:

              Error: Failed to download cluster config: Unable to convert file to yaml: yaml: unmarshal errors:
                line 52: cannot unmarshal !!str `/root/k...` into []string