运行环境:华为云
系统版本:centos 7.6
服务器资源:3台ECS 8C 16G
kubesphere版本:3.1.1
kubernetes版本:1.20.4
应用服务: java,maven工具
问题描述:
1.ABC三台纯净系统,设定服务器A为master,B和C为workder,使用kubekey安装k8s和devops,设计流水线并运行,成功,没有遇到任何问题;
2.在上面操作的基础上,新增同配置服务器D ,在服务器D上安装 socat,conntrack,chrony,与另外三台服务器设置免密登录,同步时间,手动安装docker并确认docker版本与其他三台服务器的docker版本一致;
3.修改config-sample.yaml配置,将服务器D信息添加至hosts和roleGroups中
4.使用 ./kk add nodes -f config-sample.yaml命令添加新节点至现有集群
结果:
在概览中和节点管理/集群节点中看到新添加的第四台服务器;
等了大概20分钟后,重新运行流水线会出现以下情况:
第一种情况: 运行流水线后,会运行maven容器,如果maven容器被调度到服务器B或C时,可以正常运行;
第二种情况: 如果maven容器被调度到新增的服务器D上将无法运行.maven容器将陷入创建-运行-销毁-创建-运行..的循环调度中;
排查情况:
1.手动调度: 创建了一个容器,手动调度到新增节点中,可以正常运行;
操作步骤: 在应用负载中创建一个nginx容器,并指定到新增的台服务器D上,可以正常运行
2.查看运行的maven容器组下有两个容器分别为 jnlp和maven,jnlp中报错如下:
SEVERE: Failed to connect to http://ks-jenkins.kubesphere-devops-system:80/tcpSlaveAgentListener/: ks-jenkins.kubesphere-devops-system
java.io.IOException: Failed to connect to http://ks-jenkins.kubesphere-devops-system:80/tcpSlaveAgentListener/: ks-jenkins.kubesphere-devops-system
at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:196)
at hudson.remoting.Engine.innerRun(Engine.java:523)
at hudson.remoting.Engine.run(Engine.java:474)
Caused by: java.net.UnknownHostException: ks-jenkins.kubesphere-devops-system
查看ks-jenkins容器报错如下:
2021-08-31 01:33:08.647+0000 [id=5342] WARNING o.c.j.p.k.KubernetesLauncher#launch: Error in provisioning; agent=KubernetesSlave name: maven-8zmk0, template=PodTemplate{id='12420dcc-ecba-420a-a6fc-d59826d1d968', name='maven', namespace='kubesphere-devops-system', label='maven', nodeUsageMode=EXCLUSIVE, workspaceVolume=EmptyDirWorkspaceVolume [memory=false], volumes=[HostPathVolume [mountPath=/var/run/docker.sock, hostPath=/var/run/docker.sock], HostPathVolume [mountPath=/root/.m2, hostPath=/var/data/jenkins_maven_cache], HostPathVolume [mountPath=/root/.sonar/cache, hostPath=/var/data/jenkins_sonar_cache]], containers=[ContainerTemplate{name='maven', image='registry.cn-beijing.aliyuncs.com/kubesphereio/builder-maven:v3.1.0', command='cat', args='', ttyEnabled=true, resourceRequestCpu='100m', resourceRequestMemory='100Mi', resourceLimitCpu='4000m', resourceLimitMemory='8192Mi'}, ContainerTemplate{name='jnlp', image='registry.cn-beijing.aliyuncs.com/kubesphereio/jnlp-slave:3.27-1', command='jenkins-slave', args='${computer.jnlpmac} ${computer.name}', resourceRequestCpu='50m', resourceRequestMemory='400Mi', resourceLimitMemory='1536Mi'}]}
java.lang.IllegalStateException: Node was deleted, computer is null
at org.csanchez.jenkins.plugins.kubernetes.KubernetesLauncher.launch(KubernetesLauncher.java:175)
at hudson.slaves.SlaveComputer.lambda$_connect$0(SlaveComputer.java:294)
at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2021-08-31 01:33:08.647+0000 [id=5342] INFO o.c.j.p.k.KubernetesSlave#_terminate: Terminating Kubernetes instance for agent maven-8zmk0
FATAL: Computer for agent is null: maven-8zmk0
3.将服务器D停止调度,流水线正常运行