记录一次客户环境安装kubesphere的流程
以上是2.1.0的在线安装,2.1.1的也是如此。
Kubesphere 离线自动化安装完整过程
以上是2.1.1的离线安装

以上安装教程在多套云环境下均安装过:包含神龙云、腾讯云、华为云、公司内部虚拟机环境都有安装过,各个厂商云服务器需要做的特殊修改也有提及,基本是比较完整的。

另外在此感谢青云的平台,让我一个刚接手k8s的人可以同时维护6套云环境,以后可能会更多,压力山大,后续还得仰仗青云。好想发个表情,这个帖子编写体验有待改善。哈哈!

PS:由于是新手,所以可能存在部分冗余操作,但是正是由于是新手,所以写的比较全。文章之前写的,可能存在部分后面优化的内容没完善到,后续会在原地址不断完善起来。

以下将2.1.0的在客户环境的完整安装过程输出:

[TOC]

记录一次客户环境安装kubesphere的流程

服务器申请注意事项

  • k8s的所有机器和nfs机器需要能够访问互联网,用于下载安装软件,安装完后可以关闭
  • 10.233段的ip需要确保没被使用,k8s集群需要用来做虚拟ip
  • k8s机器的ip段例如192.168.3.xxx需要互通和kube_pods_subnet: 10.233.64.0/18以及kube_service_addresses: 10.233.0.0/18网段也要互通;华为云要互通需要取消所有k8s主机的通信认证,类似其他云服务商的安全策略。
  • k8s机器的ip段的222ip需要预留出来做master负载均衡,神龙云该ip需要挂载网卡不挂载实例,华三云不需要挂载,但是需要取消3台master节点的mac绑定否则只有一台能够ping通222的ip,华为云需要单独申请虚拟ip。
  • k8s集群机器的密码请根据实际情况修改好后再提供给出来,密码中不要带!$等非常特殊的字符,有些特殊字符会导致安装报错
  • 所有k8s机器的ssh需要配置能够通过账号密码访问,需要取消ssh超时限制,且能够快速的通过ssh切换到其他机器,否则容易导致切换超时问题。实在不行那就配置ssh免登录试试吧,这是个痛苦的活(好吧,我试过了有效)
  • 所有服务器根目录给150G
  • cpu和内存,操作系统需要满足要求,cpu建议4核以上,内存建议16G以上,操作系统CentOS Linux release 7.7.1908 (Core)
  • 问下客户是否有云存储有的话用客户的云存储,这样就不用自己搭建存储和做备份了
  • 确保所有机器的硬件时间和系统时间都是一样且正确的,另外注意修改镜像中的时区https://blog.csdn.net/aixiaoyang168/article/details/88341082
  • 腾讯云注意事项,/var/lib/docker和/var/lib/kubelet不能挂在腾讯的cfs存储,个人测试的挂载安装会导致一些问题,取消挂载则无此问题。

背景介绍

在客户的服务器上安装k8s,采用了kubesphere的安装脚本来安装。
资源情况:

  • 服务器使用的是阿里的神龙云,其和其他几个厂商的云存在一定差异,属于物理云
  • 操作系统是centos7.7 64
  • 服务器资源情况:
服务用途服务器ip备注
nfs2192.168.100.68做nfs服务器
master1192.168.100.57主节点
master2192.168.100.56主节点
master3192.168.100.58主节点
node1192.168.100.61工作节点
node2192.168.100.60工作节点
node3192.168.100.62工作节点
node4192.168.100.59工作节点
node5192.168.100.71工作节点
node6192.168.100.72工作节点

安装前服务器信息验证与配置

  • 验证脚本:可以列出linux系统版本,内核版本,cpu情况,内存情况
    cat /etc/redhat-release && cat /proc/cpuinfo |grep "processor"|wc -l && cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l && fdisk -l|grep /dev/sda | head -n 1 && cat /proc/meminfo | grep MemTotal
    建议其中一台用来做安装的任务机master的cpu至少4核,内存8G,宁外两台master机可以是2核8G的服务器,node节点至少8G以上,如果node机器并不是很多,服务器资源不是很足的话建议node机器可以少几台但是内存可以稍微大一些如用32G。
    以下是node1的情况

    CentOS Linux release 7.7.1908 (Core)
    4
    4
    MemTotal:       16262128 kB
  • 检查服务器之间是否可以通过ssh相互登录,并设置ssh超时时间

    # 登录检查
    sss xxx.xxx.xxx.xxx
    # ssh超时设置
    注释文件/etc/profile中最下面的export TMOUT=300
    并使其立即生效
    source /etc/profile
  • 防火墙是否关闭验证,已关闭
    firewall-cmd --state

  • 检查服务器是否可以联网,验证结果为可以上网
    ping baidu.com

  • 查看网络配置和修改网络配置,这个一般不用动,一般服务器提供方会弄好给你。

    # 网卡uuid生成方法
    uuidgen eth0
    # 查看网络配置
    cd /etc/sysconfig/network-scripts
    cat ifcfg-ens160
    # node1情况
    [root@node1 network-scripts]# cat ifcfg-eth0 
    TYPE="Ethernet"
    PROXY_METHOD="none"
    BROWSER_ONLY="no"
    BOOTPROTO="static"
    DEFROUTE="yes"
    IPV4_FAILURE_FATAL="no"
    IPV6INIT="yes"
    IPV6_AUTOCONF="yes"
    IPV6_DEFROUTE="yes"
    IPV6_FAILURE_FATAL="no"
    IPV6_ADDR_GEN_MODE="stable-privacy"
    NAME="eth0"
    UUID="04d22cbb-d66d-407d-9c96-9283a669d271"
    DEVICE="eth0"
    ONBOOT="yes"
    IPADDR="192.168.100.61"
    PREFIX="24"
    GATEWAY="192.168.100.1"
    IPV6_PRIVACY="no"
    DNS1="8.8.8.8"
    DNS2="144.144.144.144"
    # 重启网卡
    service network restart
  • 网速测试

cd /home/tools
wget https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py
chmod a+rx speedtest.py
mv speedtest.py /usr/local/bin/speedtest-cli
chown root:root /usr/local/bin/speedtest-cli

speedtest-cli
  • 检查磁盘信息
# 查看磁盘信息
fdisk -ls
# 查看磁盘分区和挂载情况
lsblk
# 根目录扩容,请查看linux专题关于磁盘扩容的操作说明

其他需要确保的

  • 云服务器一般除了防火墙外还存在外部的安全组限制,一般可以通过门户配置,这个环境特殊特殊,对方最终是在虚拟化层屏蔽了网卡安全组,取消了网卡流表策略才使192的网段和10的网段打通了。
  • 需要确保10.233的网段没被使用,且192的宿主机需要能够和10.233网段的ip可以ping通,不通服务器之间ping通。
  • 多master的安装需要预留一个虚拟ip如我预留的192.168.100.222做master负载的虚拟ip,其需要在云门户上添加一个网卡并配置这个ip,但是这个ip不绑定服务器实例(这个是神龙云的特殊处,我在公司实验环境安装是不要求其绑定网卡的)

nfs的安装和各k8s集群与nfs的通信检查

nfs服务搭建

hostnamectl set-hostname nfs2
mkdir /home/nfs2 -p
yum install nfs-utils rpcbind -y
# 修改配置
vi /etc/systemd/system/sockets.target.wants/rpcbind.socket
# 注释:111的都注释掉
#ListenStream=0.0.0.0:111
#ListenDatagram=0.0.0.0:111
#ListenStream=[::]:111
#ListenDatagram=[::]:111

# 添加共享配置
vi /etc/exports
添加如下信息
#share /home/nfs2 by noteshare for bingbing at 2020-1-17
/home/nfs2 192.168.100.0/24(rw,sync,no_root_squash)

# 重装与启动
systemctl daemon-reload
systemctl start/restart rpcbind.socket
systemctl start/restart rpcbind
systemctl start nfs
systemctl start nfs-server
# 设置开机启动
systemctl enable nfs-server

# 验证端口
[root@vm ~]# netstat -tnulp|grep rpc
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      26754/rpcbind       
tcp        0      0 0.0.0.0:20048           0.0.0.0:*               LISTEN      21385/rpc.mountd    
tcp        0      0 0.0.0.0:36435           0.0.0.0:*               LISTEN      26764/rpc.statd     
tcp6       0      0 :::111                  :::*                    LISTEN      26754/rpcbind       
tcp6       0      0 :::20048                :::*                    LISTEN      21385/rpc.mountd    
tcp6       0      0 :::51189                :::*                    LISTEN      26764/rpc.statd     
udp        0      0 0.0.0.0:20048           0.0.0.0:*                           21385/rpc.mountd    
udp        0      0 0.0.0.0:111             0.0.0.0:*                           26754/rpcbind       
udp        0      0 0.0.0.0:49757           0.0.0.0:*                           26764/rpc.statd     
udp        0      0 0.0.0.0:639             0.0.0.0:*                           26754/rpcbind       
udp        0      0 127.0.0.1:659           0.0.0.0:*                           26764/rpc.statd     
udp6       0      0 :::20048                :::*                                21385/rpc.mountd    
udp6       0      0 :::111                  :::*                                26754/rpcbind       
udp6       0      0 :::639                  :::*                                26754/rpcbind       
udp6       0      0 :::54093                :::*                                26764/rpc.statd
# 继续 查看端口映射(在未启动nfs服务时,不能看到nfs端口的映射情况)
[root@vm ~]# rpcinfo -p localhost 
   program vers proto   port  service
   100000    4   tcp    111  portmapper
   100000    3   tcp    111  portmapper
   100000    2   tcp    111  portmapper
   100000    4   udp    111  portmapper
   100000    3   udp    111  portmapper
   100000    2   udp    111  portmapper
   100024    1   udp  49757  status
   100024    1   tcp  36435  status

# 验证
[root@nfs2 ~]# showmount -e localhost
Export list for localhost:
/home/nfs2 192.168.100.0/24

# 客户端验证,找一台机器验证192.168.100.57,如果不同需要把防火墙的端口开放,头部有端口开放要求
yum install nfs-utils rpcbind -y
/sbin/rpcbind
[root@node2 ~]# showmount -e 192.168.100.56
Export list for 192.168.100.56:
/home/nfs2 192.168.100.0/24

防火墙和安全组端口开放要求(暂未用上)

内部防火墙关闭,主要配置安全组
以下是整理的一个模板:(以下的端口开放暂时没用上,此次是把安全组关闭了,这个后续测试加上,下面是根据官方提供的整理的,暂未完全验证)

![port](http://www.itnoteshare.com\articlePic/getArticlePic.htm?fileName=25_386_37ebfb65-2f71-4978-aafd-87f98dd2f12a.jpg “port”)

集群安装

master1作为任务机

  • 3台master机安装keepalived+haproxy,安装方式一致,下面以master1上的安装说明
yum install -y keepalived && yum install -y haproxy

# 修改/etc/haproxy

global
	log /dev/log    local0
	log /dev/log    local1 notice
	chroot /var/lib/haproxy
	#stats socket /run/haproxy/admin.sock mode 660 level admin
	stats timeout 30s
	user haproxy
	group haproxy
	daemon
	nbproc 1

defaults
	log     global
	timeout connect 5000
	timeout client  50000
	timeout server  50000

listen kube-master
	bind 0.0.0.0:8443
	mode tcp
	option tcplog
	balance roundrobin
	server master1 192.168.100.57:6443  check inter 10000 fall 2 rise 2 weight 1
	server master2 192.168.100.56:6443  check inter 10000 fall 2 rise 2 weight 1
	server master3 192.168.100.58 6443  check inter 10000 fall 2 rise 2 weight 1

# 修改/etc/keepalived,注意修改下面的interface eth0(网卡标志)

global_defs {
    router_id lb-backup
}

vrrp_instance VI-kube-master {
    state MASTER
    priority 110
    dont_track_primary
    interface eth0
    virtual_router_id 90
    advert_int 3
    virtual_ipaddress {
        192.168.100.222
    }
}

# 设置开机启动和启动服务,3台master都需要做
systemctl enable keepalived && systemctl restart keepalived && systemctl enable haproxy && systemctl restart haproxy
# 在3台master机器上分别ping虚拟ip192.168.100.222都需要能通
  • master1任务机下载kubesphere配置并进行配置和运行安装
mkdir /home/tools/kubesphere -p && cd /home/tools/kubesphere && curl -L https://kubesphere.io/download/stable/v2.1.0 > installer.tar.gz && tar -zxf installer.tar.gz
  • 修改以上下载的安装信息中的common.yaml和host.ini文件,以下是我此次的文件内容

common.yaml

# 
# Copyright 2018 The KubeSphere Authors.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#     http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# 
# KubeSphere Installer Sample Configuration File
#
# Note that below sample configuration could be reference to install
# both Kubernetes and KubeSphere together.
# For the users who want to install KubeSphere upon an existing Kubernetes cluster
# please visit https://github.com/kubesphere/ks-installer for more information

######################### Kubernetes #########################

# The supported Kubernetes to install. Note: not all
# Kubernetes versions are supported by KubeSphere, visit
# https://kubesphere.io/docs to get full support list.
kube_version: v1.15.5

# The supported etcd to install. Note: not all
# etcd versions are supported by KubeSphere, visit
# https://kubesphere.io/docs to get full support list.
etcd_version: v3.2.18

# Configure a cron job to backup etcd data, which is running on etcd host machines.
# Period of running backup etcd job, the unit is minutes.
# 30 as default, means backup etcd every 30 minutes.
etcd_backup_period: 30

# How many backup replicas to keep.
# 5 means to keep latest 5 backups, older ones will be deleted by order.
keep_backup_number: 5

# The location to store etcd backups files on etcd host machines.
etcd_backup_dir: "/var/backups/kube_etcd"

## Add other registry.
docker_registry_mirrors:
  - https://docker.mirrors.ustc.edu.cn
  - https://registry.docker-cn.com
  - https://mirror.aliyuncs.com
  - https://wixr7yss.mirror.aliyuncs.com
  - http://xxx.xxx.xxx.xxx:30280
  - http://harbor.powerdnoteshareata.com.cn:30280

docker_insecure_registries:
  - 192.168.100.57:5000
  - xxx.xxx.xxx.xxx:30280
  - harbor.noteshare.com.cn:30280

# Kubernetes network plugin. Note that calico and flannel
# are recommended plugins, which are tested and verified by KubeSphere.
kube_network_plugin: calico

# A valid CIDR range for Kubernetes services,
# 1. should not overlap with node subnet
# 2. should not overlap with Kubernetes pod subnet
kube_service_addresses: 10.233.0.0/18

# A valid CIDR range for Kubernetes pod subnet,
# 1. should not overlap with node subnet
# 2. should not overlap with Kubernetes services subnet
kube_pods_subnet: 10.233.64.0/18

# Kube-proxy proxyMode configuration, either ipvs, or iptables
kube_proxy_mode: ipvs

# Maximum pods allowed to run on every node.
kubelet_max_pods: 110

# Enable nodelocal dns cache
enable_nodelocaldns: true

# HA(Highly Available) loadbalancer example config
# apiserver_loadbalancer_domain_name: "lb.kubesphere.local"
loadbalancer_apiserver:
  address: 192.168.100.222
  port: 8443

######################### Common Storage #########################

# This section will configure storage to use in Kubernetes.
# For full supported storage list, please check
# https://docs.kubesphere.io/v2.1/zh-CN/installation/storage-configuration

# LOCAL VOLUME
# KubeSphere will use local volume as storage by default.
# This is just for demostration and testing purpose, and highly not
# recommended in production environment.
# For production environment, please change to other storage type.
local_volume_enabled: false
local_volume_is_default_class: false
local_volume_storage_class: local


# CEPH RBD
# KubeSphere can use an existing ceph as backend storage service.
# change to true to use ceph,
# MUST disable other storage types in configuration file.
ceph_rbd_enabled: false
ceph_rbd_is_default_class: false
ceph_rbd_storage_class: rbd

# Ceph rbd monitor endpoints, for example
#
# ceph_rbd_monitors:
#   - 172.24.0.1:6789
#   - 172.24.0.2:6789
#   - 172.24.0.3:6789
ceph_rbd_monitors:
  - SHOULD_BE_REPLACED

# ceph admin account name
ceph_rbd_admin_id: admin

# ceph admin secret, for example,
# ceph_rbd_admin_secret: AQAnwihbXo+uDxAAD0HmWziVgTaAdai90IzZ6Q==
ceph_rbd_admin_secret: TYPE_ADMIN_ACCOUNT_HERE
ceph_rbd_pool: rbd
ceph_rbd_user_id: admin
# e.g. ceph_rbd_user_secret: AQAnwihbXo+uDxAAD0HmWziVgTaAdai90IzZ6Q==
ceph_rbd_user_secret: TYPE_ADMIN_SECRET_HERE
ceph_rbd_fsType: ext4
ceph_rbd_imageFormat: 1

# Additional ceph configurations
# ceph_rbd_imageFeatures: layering


# NFS CONFIGURATION
# KubeSphere can use existing nfs service as backend storage service.
# change to true to use nfs.
nfs_client_enabled: true
nfs_client_is_default_class: true

# Hostname of the NFS server(ip or hostname)
nfs_server: 192.168.100.68

# Basepath of the mount point
nfs_path: /home/nfs2/k8sdata
nfs_vers3_enabled: false
nfs_archiveOnDelete: false


# GLUSTERFS CONFIGURATION
# change to true to use glusterfs as backend storage service.
# for more detailed configuration, please check
# https://docs.kubesphere.io/v2.1/zh-CN/installation/storage-configuration
glusterfs_provisioner_enabled: false
glusterfs_provisioner_is_default_class: false
glusterfs_provisioner_storage_class: glusterfs
glusterfs_provisioner_restauthenabled: true
# e.g. glusterfs_provisioner_resturl: http://192.168.0.4:8080
glusterfs_provisioner_resturl: SHOULD_BE_REPLACED
# e.g. glusterfs_provisioner_clusterid: 6a6792ed25405eaa6302da99f2f5e24b
glusterfs_provisioner_clusterid: SHOULD_BE_REPLACED
glusterfs_provisioner_restuser: admin
glusterfs_provisioner_secretName: heketi-secret
glusterfs_provisioner_gidMin: 40000
glusterfs_provisioner_gidMax: 50000
glusterfs_provisioner_volumetype: replicate:2
# e.g. jwt_admin_key: 123456
jwt_admin_key: SHOULD_BE_REPLACED


######################### KubeSphere #########################

# Version of KubeSphere
ks_version: v2.1.0

# KubeSphere console port, range 30000-32767,
# but 30180/30280/30380 are reserved for internal service
console_port: 30880

# Enable Multi users login.
# false means allowing only one active session to access KubeSphere for same account.
# Duplicated login action will cause previous session invalid and 
# that user will be logged out by force.
enable_multi_login: true

# devops/openpitrix/notification/alerting components depend on
# mysql/minio/etcd/openldap/redis to store credentials.
# Configure parameters below to set how much storage they use.
mysql_volume_size: 20Gi
minio_volume_size: 20Gi
etcd_volume_size: 20Gi
openldap_volume_size: 2Gi
redis_volume_size: 2Gi


# MONITORING CONFIGURATION
# monitoring is a MUST required component for KubeSphere,
# monitoring deployment configuration
# prometheus replicas numbers,
# 2 means better availability, but more resource consumption
prometheus_replicas: 2

# prometheus pod memory requests
prometheus_memory_request: 400Mi

# prometheus storage size,
# 20Gi means every prometheus replica consumes 20Gi storage
prometheus_volume_size: 20Gi

# whether to install a grafana
grafana_enabled: false

# LOGGING CONFIGURATION
# logging is an optional component when installing KubeSphere, and
# Kubernetes builtin logging APIs will be used if logging_enabled is set to false. 
# Builtin logging only provides limited functions, so recommend to enable logging.
logging_enabled: true
elasticsearch_master_replicas: 1
elasticsearch_data_replicas: 2
logsidecar_replicas: 2
elasticsearch_volume_size: 50Gi
log_max_age: 30
elk_prefix: logstash
kibana_enabled: false
#external_es_url: SHOULD_BE_REPLACED
#external_es_port: SHOULD_BE_REPLACED

# DEVOPS CONFIGURATION
# Devops is an optional component for KubeSphere.
devops_enabled: false
jenkins_memory_lim: 8Gi
jenkins_memory_req: 4Gi
jenkins_volume_size: 8Gi
jenkinsJavaOpts_Xms: 3g
jenkinsJavaOpts_Xmx: 6g
jenkinsJavaOpts_MaxRAM: 8g
sonarqube_enabled: false
#sonar_server_url: SHOULD_BE_REPLACED
#sonar_server_token: SHOULD_BE_REPLACED

# Following components are all optional for KubeSphere,
# Which could be turned on to install it before installation or later by updating its value to true
openpitrix_enabled: true
metrics_server_enabled: false
servicemesh_enabled: true
notification_enabled: true
alerting_enabled: true

# Harbor is an optional component for KubeSphere.
# Which could be turned on to install it before installation or later by updating its value to true
harbor_enabled: false
harbor_domain: harbor.devops.kubesphere.local
# GitLab is an optional component for KubeSphere.
# Which could be turned on to install it before installation or later by updating its value to true
gitlab_enabled: false
gitlab_hosts_domain: devops.kubesphere.local


# Container Engine Acceleration
# Use nvidia gpu acceleration in containers
# KubeSphere currently support Nvidia GPU V100 P100 1060 1080 1080Ti
# The driver version is 387.26,cuda is 9.1
# nvidia_accelerator_enabled: true
# nvidia_gpu_nodes:
#   - kube-gpu-001

hosts.ini

; Parameters:
;  ansible_connection: connection type to the target machine
;  ansible_host: the host name of the target machine
;  ip: ip address of the target machine
;  ansible_user: the default user name for ssh connection
;  ansible_ssh_pass: the password for ssh connection
;  ansible_become_pass: the privilege escalation password to grant access
;  ansible_port: the ssh port number, if not 22

; If installer is ran as non-root user who has sudo privilege, refer to the following sample configuration:
; e.g 
;  master ansible_connection=local  ip=192.168.0.5  ansible_user=ubuntu  ansible_become_pass=Qcloud@123 
;  node1  ansible_host=192.168.0.6  ip=192.168.0.6  ansible_user=ubuntu  ansible_become_pass=Qcloud@123
;  node2  ansible_host=192.168.0.8  ip=192.168.0.8  ansible_user=ubuntu  ansible_become_pass=Qcloud@123

; As recommended as below sample configuration, use root account by default to install


[all]
master1 		ansible_connection=local  		ip=192.168.100.57
master2  		ansible_host=192.168.100.56  	ip=192.168.100.56  	ansible_ssh_pass=noteshare@568
master3  		ansible_host=192.168.100.58  	ip=192.168.100.58  	ansible_ssh_pass=noteshare@568
node1  			ansible_host=192.168.100.61  	ip=192.168.100.61  	ansible_ssh_pass=noteshare@568
node1  			ansible_host=192.168.100.61  	ip=192.168.100.61  	ansible_ssh_pass=noteshare@568
node2  			ansible_host=192.168.100.60  	ip=192.168.100.60  	ansible_ssh_pass=noteshare@568
node3  			ansible_host=192.168.100.62  	ip=192.168.100.62  	ansible_ssh_pass=noteshare@568
node4  			ansible_host=192.168.100.59  	ip=192.168.100.59  	ansible_ssh_pass=noteshare@568
node5  			ansible_host=192.168.100.71  	ip=192.168.100.71  	ansible_ssh_pass=noteshare@568
node6  			ansible_host=192.168.100.72  	ip=192.168.100.72  	ansible_ssh_pass=noteshare@568

[kube-master]
master1
master2
master3

[kube-node]
node1
node2
node3
node4
node5
node6

[etcd]
master1
master2
master3

[k8s-cluster:children]
kube-node
kube-master
  • 执行安装
    master1节点执行安装
    cd /home/tools/kubesphere/kubesphere-all-v2.1.0/scripts
    ./install.sh
    2
    yes
    ...等待
    如果失败了,可以重新直接再执行一次安装,多执行个2~3次,有的时候是因为超时导致失败的。

安装部分问题处理

4.3、FAELED - RETRYING: KubeSphere Waiting for ks-console (30 retries left)一直卡在这里。

答: 检查cpu和内存,目前单机版部署至少8核16G 当free -m 时buff/cache占用内存资源多,执行echo 3 > /proc/sys/vm/drop_caches 指令释放下内存
  • 问题二:
    命名空间页面弹出提示:
    Internal Server Error
    rpc error: code = Internal desc = describe resources failed: Error 1146: Table ‘cluster.cluster’ doesn’t exist
    处理方法:
    使用以下命令查看job运行情况
    kubectl get job -n openpitrix-system -o wide
    使用命令:将所有未0/1也就是启动失败的job重新启动
    kubectl -n openpitrix-system get job openpitrix-task-db-ctrl-job -o json | jq 'del(.spec.selector)' | jq 'del(.spec.template.metadata.labels)' | kubectl replace --force -f -
    问题得到解决

  • 问题三:
    报错信息

2020-02-27 19:14:12,895 p=28985 u=root |  FAILED - RETRYING: download_file | Download item (1 retries left).
2020-02-27 19:16:35,899 p=28985 u=root |  An exception occurred during task execution. To see the full traceback, use -vvv. The error was: SSLError: ('The read operation timed out',)
2020-02-27 19:16:35,900 p=28985 u=root |  fatal: [node1 -> 10.16.3.17]: FAILED! => {
    "attempts": 4, 
    "changed": false
}

MSG:

failed to create temporary content file: ('The read operation timed out',)

2020-02-27 19:16:42,724 p=28985 u=root |  An exception occurred during task execution. To see the full traceback, use -vvv. The error was: SSLError: ('The read operation timed out',)
2020-02-27 19:16:42,725 p=28985 u=root |  fatal: [node3 -> 10.16.3.19]: FAILED! => {
    "attempts": 4, 
    "changed": false
}

MSG:

failed to create temporary content file: ('The read operation timed out',)

镜像下载慢,配置阿里镜像加速
common.yaml添加
docker_registry_mirrors:

  • 阿里镜像加速地址,地址登录阿里官网到镜像仓库中去拿找到《镜像加速器》

  • 问题四
    卸载时总是报错:这个是因为ssh连接超时问题,尝试让网络管理员解决,表示解决不了,fuck,不偷懒了,把所有机器的ssh免登录配置上试试,结果是有效果的。其实我不想这么干,太累了。

    Timeout (12s) waiting for privilege escalation prompt
  • 问题五
    部分节点执行get-pip.py失败,以下是报错信息

2020-02-28 22:39:39,675 p=80528 u=root |  fatal: [node5]: FAILED! => {
    "changed": true, 
    "cmd": "sudo python /tmp/pip/get-pip.py", 
    "delta": "0:02:20.094835", 
    "end": "2020-02-28 22:39:39.655732", 
    "rc": 2, 
    "start": "2020-02-28 22:37:19.560897"
}

STDOUT:

Collecting pip
  Downloading https://files.pythonhosted.org/packages/54/0c/d01aa759fdc501a58f431eb594a17495f15b88da142ce14b5845662c13f3/pip-20.0.2-py2.py3-none-any.whl (1.4MB)


STDERR:

解决方法:安装完后登录到该机器手工执行命令sudo python /tmp/pip/get-pip.py,然后不卸载直接再次安装。

用到的一些其他辅助验证命令

ipvsadm
ip route