noteshareK零S
- 已编辑
记录一次客户环境安装kubesphere的流程
以上是2.1.0的在线安装,2.1.1的也是如此。
Kubesphere 离线自动化安装完整过程
以上是2.1.1的离线安装
以上安装教程在多套云环境下均安装过:包含神龙云、腾讯云、华为云、公司内部虚拟机环境都有安装过,各个厂商云服务器需要做的特殊修改也有提及,基本是比较完整的。
另外在此感谢青云的平台,让我一个刚接手k8s的人可以同时维护6套云环境,以后可能会更多,压力山大,后续还得仰仗青云。好想发个表情,这个帖子编写体验有待改善。哈哈!
PS:由于是新手,所以可能存在部分冗余操作,但是正是由于是新手,所以写的比较全。文章之前写的,可能存在部分后面优化的内容没完善到,后续会在原地址不断完善起来。
以下将2.1.0的在客户环境的完整安装过程输出:
[TOC]
记录一次客户环境安装kubesphere的流程
服务器申请注意事项
- k8s的所有机器和nfs机器需要能够访问互联网,用于下载安装软件,安装完后可以关闭
- 10.233段的ip需要确保没被使用,k8s集群需要用来做虚拟ip
- k8s机器的ip段例如192.168.3.xxx需要互通和
kube_pods_subnet: 10.233.64.0/18以及kube_service_addresses: 10.233.0.0/18
网段也要互通;华为云要互通需要取消所有k8s主机的通信认证,类似其他云服务商的安全策略。 - k8s机器的ip段的222ip需要预留出来做master负载均衡,神龙云该ip需要挂载网卡不挂载实例,华三云不需要挂载,但是需要取消3台master节点的mac绑定否则只有一台能够ping通222的ip,华为云需要单独申请虚拟ip。
- k8s集群机器的密码请根据实际情况修改好后再提供给出来,密码中不要带!$等非常特殊的字符,有些特殊字符会导致安装报错
- 所有k8s机器的ssh需要配置能够通过账号密码访问,需要取消ssh超时限制,且能够快速的通过ssh切换到其他机器,否则容易导致切换超时问题。实在不行那就配置ssh免登录试试吧,这是个痛苦的活(好吧,我试过了有效)
- 所有服务器根目录给150G
- cpu和内存,操作系统需要满足要求,cpu建议4核以上,内存建议16G以上,操作系统
CentOS Linux release 7.7.1908 (Core)
- 问下客户是否有云存储有的话用客户的云存储,这样就不用自己搭建存储和做备份了
- 确保所有机器的硬件时间和系统时间都是一样且正确的,另外注意修改镜像中的时区
https://blog.csdn.net/aixiaoyang168/article/details/88341082
- 腾讯云注意事项,
/var/lib/docker和/var/lib/kubelet
不能挂在腾讯的cfs存储,个人测试的挂载安装会导致一些问题,取消挂载则无此问题。
背景介绍
在客户的服务器上安装k8s,采用了kubesphere的安装脚本来安装。
资源情况:
- 服务器使用的是阿里的神龙云,其和其他几个厂商的云存在一定差异,属于物理云
- 操作系统是
centos7.7 64
- 服务器资源情况:
服务用途 | 服务器ip | 备注 |
---|---|---|
nfs2 | 192.168.100.68 | 做nfs服务器 |
master1 | 192.168.100.57 | 主节点 |
master2 | 192.168.100.56 | 主节点 |
master3 | 192.168.100.58 | 主节点 |
node1 | 192.168.100.61 | 工作节点 |
node2 | 192.168.100.60 | 工作节点 |
node3 | 192.168.100.62 | 工作节点 |
node4 | 192.168.100.59 | 工作节点 |
node5 | 192.168.100.71 | 工作节点 |
node6 | 192.168.100.72 | 工作节点 |
安装前服务器信息验证与配置
验证脚本:可以列出linux系统版本,内核版本,cpu情况,内存情况
cat /etc/redhat-release && cat /proc/cpuinfo |grep "processor"|wc -l && cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l && fdisk -l|grep /dev/sda | head -n 1 && cat /proc/meminfo | grep MemTotal
建议其中一台用来做安装的任务机master的cpu至少4核,内存8G,宁外两台master机可以是2核8G的服务器,node节点至少8G以上,如果node机器并不是很多,服务器资源不是很足的话建议node机器可以少几台但是内存可以稍微大一些如用32G。
以下是node1的情况CentOS Linux release 7.7.1908 (Core) 4 4 MemTotal: 16262128 kB
检查服务器之间是否可以通过ssh相互登录,并设置ssh超时时间
# 登录检查 sss xxx.xxx.xxx.xxx # ssh超时设置 注释文件/etc/profile中最下面的export TMOUT=300 并使其立即生效 source /etc/profile
防火墙是否关闭验证,已关闭
firewall-cmd --state
检查服务器是否可以联网,验证结果为可以上网
ping baidu.com
查看网络配置和修改网络配置,这个一般不用动,一般服务器提供方会弄好给你。
# 网卡uuid生成方法 uuidgen eth0 # 查看网络配置 cd /etc/sysconfig/network-scripts cat ifcfg-ens160 # node1情况 [root@node1 network-scripts]# cat ifcfg-eth0 TYPE="Ethernet" PROXY_METHOD="none" BROWSER_ONLY="no" BOOTPROTO="static" DEFROUTE="yes" IPV4_FAILURE_FATAL="no" IPV6INIT="yes" IPV6_AUTOCONF="yes" IPV6_DEFROUTE="yes" IPV6_FAILURE_FATAL="no" IPV6_ADDR_GEN_MODE="stable-privacy" NAME="eth0" UUID="04d22cbb-d66d-407d-9c96-9283a669d271" DEVICE="eth0" ONBOOT="yes" IPADDR="192.168.100.61" PREFIX="24" GATEWAY="192.168.100.1" IPV6_PRIVACY="no" DNS1="8.8.8.8" DNS2="144.144.144.144" # 重启网卡 service network restart
网速测试
cd /home/tools
wget https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py
chmod a+rx speedtest.py
mv speedtest.py /usr/local/bin/speedtest-cli
chown root:root /usr/local/bin/speedtest-cli
speedtest-cli
- 检查磁盘信息
# 查看磁盘信息
fdisk -ls
# 查看磁盘分区和挂载情况
lsblk
# 根目录扩容,请查看linux专题关于磁盘扩容的操作说明
其他需要确保的
- 云服务器一般除了防火墙外还存在外部的安全组限制,一般可以通过门户配置,这个环境特殊特殊,对方最终是在虚拟化层屏蔽了网卡安全组,取消了网卡流表策略才使192的网段和10的网段打通了。
- 需要确保10.233的网段没被使用,且192的宿主机需要能够和10.233网段的ip可以ping通,不通服务器之间ping通。
- 多master的安装需要预留一个虚拟ip如我预留的192.168.100.222做master负载的虚拟ip,其需要在云门户上添加一个网卡并配置这个ip,但是这个ip不绑定服务器实例(这个是神龙云的特殊处,我在公司实验环境安装是不要求其绑定网卡的)
nfs的安装和各k8s集群与nfs的通信检查
nfs服务搭建
hostnamectl set-hostname nfs2
mkdir /home/nfs2 -p
yum install nfs-utils rpcbind -y
# 修改配置
vi /etc/systemd/system/sockets.target.wants/rpcbind.socket
# 注释:111的都注释掉
#ListenStream=0.0.0.0:111
#ListenDatagram=0.0.0.0:111
#ListenStream=[::]:111
#ListenDatagram=[::]:111
# 添加共享配置
vi /etc/exports
添加如下信息
#share /home/nfs2 by noteshare for bingbing at 2020-1-17
/home/nfs2 192.168.100.0/24(rw,sync,no_root_squash)
# 重装与启动
systemctl daemon-reload
systemctl start/restart rpcbind.socket
systemctl start/restart rpcbind
systemctl start nfs
systemctl start nfs-server
# 设置开机启动
systemctl enable nfs-server
# 验证端口
[root@vm ~]# netstat -tnulp|grep rpc
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 26754/rpcbind
tcp 0 0 0.0.0.0:20048 0.0.0.0:* LISTEN 21385/rpc.mountd
tcp 0 0 0.0.0.0:36435 0.0.0.0:* LISTEN 26764/rpc.statd
tcp6 0 0 :::111 :::* LISTEN 26754/rpcbind
tcp6 0 0 :::20048 :::* LISTEN 21385/rpc.mountd
tcp6 0 0 :::51189 :::* LISTEN 26764/rpc.statd
udp 0 0 0.0.0.0:20048 0.0.0.0:* 21385/rpc.mountd
udp 0 0 0.0.0.0:111 0.0.0.0:* 26754/rpcbind
udp 0 0 0.0.0.0:49757 0.0.0.0:* 26764/rpc.statd
udp 0 0 0.0.0.0:639 0.0.0.0:* 26754/rpcbind
udp 0 0 127.0.0.1:659 0.0.0.0:* 26764/rpc.statd
udp6 0 0 :::20048 :::* 21385/rpc.mountd
udp6 0 0 :::111 :::* 26754/rpcbind
udp6 0 0 :::639 :::* 26754/rpcbind
udp6 0 0 :::54093 :::* 26764/rpc.statd
# 继续 查看端口映射(在未启动nfs服务时,不能看到nfs端口的映射情况)
[root@vm ~]# rpcinfo -p localhost
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 49757 status
100024 1 tcp 36435 status
# 验证
[root@nfs2 ~]# showmount -e localhost
Export list for localhost:
/home/nfs2 192.168.100.0/24
# 客户端验证,找一台机器验证192.168.100.57,如果不同需要把防火墙的端口开放,头部有端口开放要求
yum install nfs-utils rpcbind -y
/sbin/rpcbind
[root@node2 ~]# showmount -e 192.168.100.56
Export list for 192.168.100.56:
/home/nfs2 192.168.100.0/24
防火墙和安全组端口开放要求(暂未用上)
内部防火墙关闭,主要配置安全组
以下是整理的一个模板:(以下的端口开放暂时没用上,此次是把安全组关闭了,这个后续测试加上,下面是根据官方提供的整理的,暂未完全验证)

集群安装
master1作为任务机
- 3台master机安装keepalived+haproxy,安装方式一致,下面以master1上的安装说明
yum install -y keepalived && yum install -y haproxy
# 修改/etc/haproxy
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
#stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
nbproc 1
defaults
log global
timeout connect 5000
timeout client 50000
timeout server 50000
listen kube-master
bind 0.0.0.0:8443
mode tcp
option tcplog
balance roundrobin
server master1 192.168.100.57:6443 check inter 10000 fall 2 rise 2 weight 1
server master2 192.168.100.56:6443 check inter 10000 fall 2 rise 2 weight 1
server master3 192.168.100.58 6443 check inter 10000 fall 2 rise 2 weight 1
# 修改/etc/keepalived,注意修改下面的interface eth0(网卡标志)
global_defs {
router_id lb-backup
}
vrrp_instance VI-kube-master {
state MASTER
priority 110
dont_track_primary
interface eth0
virtual_router_id 90
advert_int 3
virtual_ipaddress {
192.168.100.222
}
}
# 设置开机启动和启动服务,3台master都需要做
systemctl enable keepalived && systemctl restart keepalived && systemctl enable haproxy && systemctl restart haproxy
# 在3台master机器上分别ping虚拟ip192.168.100.222都需要能通
- master1任务机下载kubesphere配置并进行配置和运行安装
mkdir /home/tools/kubesphere -p && cd /home/tools/kubesphere && curl -L https://kubesphere.io/download/stable/v2.1.0 > installer.tar.gz && tar -zxf installer.tar.gz
- 修改以上下载的安装信息中的common.yaml和host.ini文件,以下是我此次的文件内容
common.yaml
#
# Copyright 2018 The KubeSphere Authors.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# http://www.apache.org/licenses/LICENSE-2.0
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# KubeSphere Installer Sample Configuration File
#
# Note that below sample configuration could be reference to install
# both Kubernetes and KubeSphere together.
# For the users who want to install KubeSphere upon an existing Kubernetes cluster
# please visit https://github.com/kubesphere/ks-installer for more information
######################### Kubernetes #########################
# The supported Kubernetes to install. Note: not all
# Kubernetes versions are supported by KubeSphere, visit
# https://kubesphere.io/docs to get full support list.
kube_version: v1.15.5
# The supported etcd to install. Note: not all
# etcd versions are supported by KubeSphere, visit
# https://kubesphere.io/docs to get full support list.
etcd_version: v3.2.18
# Configure a cron job to backup etcd data, which is running on etcd host machines.
# Period of running backup etcd job, the unit is minutes.
# 30 as default, means backup etcd every 30 minutes.
etcd_backup_period: 30
# How many backup replicas to keep.
# 5 means to keep latest 5 backups, older ones will be deleted by order.
keep_backup_number: 5
# The location to store etcd backups files on etcd host machines.
etcd_backup_dir: "/var/backups/kube_etcd"
## Add other registry.
docker_registry_mirrors:
- https://docker.mirrors.ustc.edu.cn
- https://registry.docker-cn.com
- https://mirror.aliyuncs.com
- https://wixr7yss.mirror.aliyuncs.com
- http://xxx.xxx.xxx.xxx:30280
- http://harbor.powerdnoteshareata.com.cn:30280
docker_insecure_registries:
- 192.168.100.57:5000
- xxx.xxx.xxx.xxx:30280
- harbor.noteshare.com.cn:30280
# Kubernetes network plugin. Note that calico and flannel
# are recommended plugins, which are tested and verified by KubeSphere.
kube_network_plugin: calico
# A valid CIDR range for Kubernetes services,
# 1. should not overlap with node subnet
# 2. should not overlap with Kubernetes pod subnet
kube_service_addresses: 10.233.0.0/18
# A valid CIDR range for Kubernetes pod subnet,
# 1. should not overlap with node subnet
# 2. should not overlap with Kubernetes services subnet
kube_pods_subnet: 10.233.64.0/18
# Kube-proxy proxyMode configuration, either ipvs, or iptables
kube_proxy_mode: ipvs
# Maximum pods allowed to run on every node.
kubelet_max_pods: 110
# Enable nodelocal dns cache
enable_nodelocaldns: true
# HA(Highly Available) loadbalancer example config
# apiserver_loadbalancer_domain_name: "lb.kubesphere.local"
loadbalancer_apiserver:
address: 192.168.100.222
port: 8443
######################### Common Storage #########################
# This section will configure storage to use in Kubernetes.
# For full supported storage list, please check
# https://docs.kubesphere.io/v2.1/zh-CN/installation/storage-configuration
# LOCAL VOLUME
# KubeSphere will use local volume as storage by default.
# This is just for demostration and testing purpose, and highly not
# recommended in production environment.
# For production environment, please change to other storage type.
local_volume_enabled: false
local_volume_is_default_class: false
local_volume_storage_class: local
# CEPH RBD
# KubeSphere can use an existing ceph as backend storage service.
# change to true to use ceph,
# MUST disable other storage types in configuration file.
ceph_rbd_enabled: false
ceph_rbd_is_default_class: false
ceph_rbd_storage_class: rbd
# Ceph rbd monitor endpoints, for example
#
# ceph_rbd_monitors:
# - 172.24.0.1:6789
# - 172.24.0.2:6789
# - 172.24.0.3:6789
ceph_rbd_monitors:
- SHOULD_BE_REPLACED
# ceph admin account name
ceph_rbd_admin_id: admin
# ceph admin secret, for example,
# ceph_rbd_admin_secret: AQAnwihbXo+uDxAAD0HmWziVgTaAdai90IzZ6Q==
ceph_rbd_admin_secret: TYPE_ADMIN_ACCOUNT_HERE
ceph_rbd_pool: rbd
ceph_rbd_user_id: admin
# e.g. ceph_rbd_user_secret: AQAnwihbXo+uDxAAD0HmWziVgTaAdai90IzZ6Q==
ceph_rbd_user_secret: TYPE_ADMIN_SECRET_HERE
ceph_rbd_fsType: ext4
ceph_rbd_imageFormat: 1
# Additional ceph configurations
# ceph_rbd_imageFeatures: layering
# NFS CONFIGURATION
# KubeSphere can use existing nfs service as backend storage service.
# change to true to use nfs.
nfs_client_enabled: true
nfs_client_is_default_class: true
# Hostname of the NFS server(ip or hostname)
nfs_server: 192.168.100.68
# Basepath of the mount point
nfs_path: /home/nfs2/k8sdata
nfs_vers3_enabled: false
nfs_archiveOnDelete: false
# GLUSTERFS CONFIGURATION
# change to true to use glusterfs as backend storage service.
# for more detailed configuration, please check
# https://docs.kubesphere.io/v2.1/zh-CN/installation/storage-configuration
glusterfs_provisioner_enabled: false
glusterfs_provisioner_is_default_class: false
glusterfs_provisioner_storage_class: glusterfs
glusterfs_provisioner_restauthenabled: true
# e.g. glusterfs_provisioner_resturl: http://192.168.0.4:8080
glusterfs_provisioner_resturl: SHOULD_BE_REPLACED
# e.g. glusterfs_provisioner_clusterid: 6a6792ed25405eaa6302da99f2f5e24b
glusterfs_provisioner_clusterid: SHOULD_BE_REPLACED
glusterfs_provisioner_restuser: admin
glusterfs_provisioner_secretName: heketi-secret
glusterfs_provisioner_gidMin: 40000
glusterfs_provisioner_gidMax: 50000
glusterfs_provisioner_volumetype: replicate:2
# e.g. jwt_admin_key: 123456
jwt_admin_key: SHOULD_BE_REPLACED
######################### KubeSphere #########################
# Version of KubeSphere
ks_version: v2.1.0
# KubeSphere console port, range 30000-32767,
# but 30180/30280/30380 are reserved for internal service
console_port: 30880
# Enable Multi users login.
# false means allowing only one active session to access KubeSphere for same account.
# Duplicated login action will cause previous session invalid and
# that user will be logged out by force.
enable_multi_login: true
# devops/openpitrix/notification/alerting components depend on
# mysql/minio/etcd/openldap/redis to store credentials.
# Configure parameters below to set how much storage they use.
mysql_volume_size: 20Gi
minio_volume_size: 20Gi
etcd_volume_size: 20Gi
openldap_volume_size: 2Gi
redis_volume_size: 2Gi
# MONITORING CONFIGURATION
# monitoring is a MUST required component for KubeSphere,
# monitoring deployment configuration
# prometheus replicas numbers,
# 2 means better availability, but more resource consumption
prometheus_replicas: 2
# prometheus pod memory requests
prometheus_memory_request: 400Mi
# prometheus storage size,
# 20Gi means every prometheus replica consumes 20Gi storage
prometheus_volume_size: 20Gi
# whether to install a grafana
grafana_enabled: false
# LOGGING CONFIGURATION
# logging is an optional component when installing KubeSphere, and
# Kubernetes builtin logging APIs will be used if logging_enabled is set to false.
# Builtin logging only provides limited functions, so recommend to enable logging.
logging_enabled: true
elasticsearch_master_replicas: 1
elasticsearch_data_replicas: 2
logsidecar_replicas: 2
elasticsearch_volume_size: 50Gi
log_max_age: 30
elk_prefix: logstash
kibana_enabled: false
#external_es_url: SHOULD_BE_REPLACED
#external_es_port: SHOULD_BE_REPLACED
# DEVOPS CONFIGURATION
# Devops is an optional component for KubeSphere.
devops_enabled: false
jenkins_memory_lim: 8Gi
jenkins_memory_req: 4Gi
jenkins_volume_size: 8Gi
jenkinsJavaOpts_Xms: 3g
jenkinsJavaOpts_Xmx: 6g
jenkinsJavaOpts_MaxRAM: 8g
sonarqube_enabled: false
#sonar_server_url: SHOULD_BE_REPLACED
#sonar_server_token: SHOULD_BE_REPLACED
# Following components are all optional for KubeSphere,
# Which could be turned on to install it before installation or later by updating its value to true
openpitrix_enabled: true
metrics_server_enabled: false
servicemesh_enabled: true
notification_enabled: true
alerting_enabled: true
# Harbor is an optional component for KubeSphere.
# Which could be turned on to install it before installation or later by updating its value to true
harbor_enabled: false
harbor_domain: harbor.devops.kubesphere.local
# GitLab is an optional component for KubeSphere.
# Which could be turned on to install it before installation or later by updating its value to true
gitlab_enabled: false
gitlab_hosts_domain: devops.kubesphere.local
# Container Engine Acceleration
# Use nvidia gpu acceleration in containers
# KubeSphere currently support Nvidia GPU V100 P100 1060 1080 1080Ti
# The driver version is 387.26,cuda is 9.1
# nvidia_accelerator_enabled: true
# nvidia_gpu_nodes:
# - kube-gpu-001
hosts.ini
; Parameters:
; ansible_connection: connection type to the target machine
; ansible_host: the host name of the target machine
; ip: ip address of the target machine
; ansible_user: the default user name for ssh connection
; ansible_ssh_pass: the password for ssh connection
; ansible_become_pass: the privilege escalation password to grant access
; ansible_port: the ssh port number, if not 22
; If installer is ran as non-root user who has sudo privilege, refer to the following sample configuration:
; e.g
; master ansible_connection=local ip=192.168.0.5 ansible_user=ubuntu ansible_become_pass=Qcloud@123
; node1 ansible_host=192.168.0.6 ip=192.168.0.6 ansible_user=ubuntu ansible_become_pass=Qcloud@123
; node2 ansible_host=192.168.0.8 ip=192.168.0.8 ansible_user=ubuntu ansible_become_pass=Qcloud@123
; As recommended as below sample configuration, use root account by default to install
[all]
master1 ansible_connection=local ip=192.168.100.57
master2 ansible_host=192.168.100.56 ip=192.168.100.56 ansible_ssh_pass=noteshare@568
master3 ansible_host=192.168.100.58 ip=192.168.100.58 ansible_ssh_pass=noteshare@568
node1 ansible_host=192.168.100.61 ip=192.168.100.61 ansible_ssh_pass=noteshare@568
node1 ansible_host=192.168.100.61 ip=192.168.100.61 ansible_ssh_pass=noteshare@568
node2 ansible_host=192.168.100.60 ip=192.168.100.60 ansible_ssh_pass=noteshare@568
node3 ansible_host=192.168.100.62 ip=192.168.100.62 ansible_ssh_pass=noteshare@568
node4 ansible_host=192.168.100.59 ip=192.168.100.59 ansible_ssh_pass=noteshare@568
node5 ansible_host=192.168.100.71 ip=192.168.100.71 ansible_ssh_pass=noteshare@568
node6 ansible_host=192.168.100.72 ip=192.168.100.72 ansible_ssh_pass=noteshare@568
[kube-master]
master1
master2
master3
[kube-node]
node1
node2
node3
node4
node5
node6
[etcd]
master1
master2
master3
[k8s-cluster:children]
kube-node
kube-master
- 执行安装
master1节点执行安装cd /home/tools/kubesphere/kubesphere-all-v2.1.0/scripts ./install.sh 2 yes ...等待 如果失败了,可以重新直接再执行一次安装,多执行个2~3次,有的时候是因为超时导致失败的。
安装部分问题处理
4.3、FAELED - RETRYING: KubeSphere Waiting for ks-console (30 retries left)一直卡在这里。
答: 检查cpu和内存,目前单机版部署至少8核16G 当free -m 时buff/cache占用内存资源多,执行echo 3 > /proc/sys/vm/drop_caches 指令释放下内存
问题二:
命名空间页面弹出提示:
Internal Server Error
rpc error: code = Internal desc = describe resources failed: Error 1146: Table ‘cluster.cluster’ doesn’t exist
处理方法:
使用以下命令查看job运行情况
kubectl get job -n openpitrix-system -o wide
使用命令:将所有未0/1也就是启动失败的job重新启动
kubectl -n openpitrix-system get job openpitrix-task-db-ctrl-job -o json | jq 'del(.spec.selector)' | jq 'del(.spec.template.metadata.labels)' | kubectl replace --force -f -
问题得到解决问题三:
报错信息
2020-02-27 19:14:12,895 p=28985 u=root | FAILED - RETRYING: download_file | Download item (1 retries left).
2020-02-27 19:16:35,899 p=28985 u=root | An exception occurred during task execution. To see the full traceback, use -vvv. The error was: SSLError: ('The read operation timed out',)
2020-02-27 19:16:35,900 p=28985 u=root | fatal: [node1 -> 10.16.3.17]: FAILED! => {
"attempts": 4,
"changed": false
}
MSG:
failed to create temporary content file: ('The read operation timed out',)
2020-02-27 19:16:42,724 p=28985 u=root | An exception occurred during task execution. To see the full traceback, use -vvv. The error was: SSLError: ('The read operation timed out',)
2020-02-27 19:16:42,725 p=28985 u=root | fatal: [node3 -> 10.16.3.19]: FAILED! => {
"attempts": 4,
"changed": false
}
MSG:
failed to create temporary content file: ('The read operation timed out',)
镜像下载慢,配置阿里镜像加速
common.yaml添加
docker_registry_mirrors:
阿里镜像加速地址,地址登录阿里官网到镜像仓库中去拿找到《镜像加速器》
问题四
卸载时总是报错:这个是因为ssh连接超时问题,尝试让网络管理员解决,表示解决不了,fuck,不偷懒了,把所有机器的ssh免登录配置上试试,结果是有效果的。其实我不想这么干,太累了。Timeout (12s) waiting for privilege escalation prompt
问题五
部分节点执行get-pip.py失败,以下是报错信息
2020-02-28 22:39:39,675 p=80528 u=root | fatal: [node5]: FAILED! => {
"changed": true,
"cmd": "sudo python /tmp/pip/get-pip.py",
"delta": "0:02:20.094835",
"end": "2020-02-28 22:39:39.655732",
"rc": 2,
"start": "2020-02-28 22:37:19.560897"
}
STDOUT:
Collecting pip
Downloading https://files.pythonhosted.org/packages/54/0c/d01aa759fdc501a58f431eb594a17495f15b88da142ce14b5845662c13f3/pip-20.0.2-py2.py3-none-any.whl (1.4MB)
STDERR:
解决方法:安装完后登录到该机器手工执行命令sudo python /tmp/pip/get-pip.py
,然后不卸载直接再次安装。
用到的一些其他辅助验证命令
ipvsadm
ip route