• DevOps
  • 4.1.2安装devops 集群 Agent 报错

创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。
发帖前请点击 发表主题 右边的 预览(👀) 按钮,确保帖子格式正确。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。

操作系统信息
虚拟机,Centos7.,16C/32G * 3

Kubernetes版本信息
例如:v1.31.0。3节点。

容器运行时
containerd,版本1.6.22

KubeSphere版本信息
v4.1.2。在线安装。全套安装。

问题是什么
安装devops报错

报错信息:

2024-10-31T17:13:57.888938803+08:00 Error: Get “https://10.211.0.1:443/api/v1/namespaces/argocd/services/devops-agent-argocd-repo-server”: dial tcp 10.211.0.1:443: connect: connection refused

2024-10-31T17:13:57.889034935+08:00 helm.go:84: [debug] Get “https://10.211.0.1:443/api/v1/namespaces/argocd/services/devops-agent-argocd-repo-server”: dial tcp 10.211.0.1:443: connect: connection refused

argocd和Jenkins部署都是成功的,访问页面也没问题,但是因为上面agent的报错导致devops使用不了

2024-11-01T14:27:16.634541258+08:00 Error: Get “https://10.211.0.1:443/api/v1/namespaces/argocd/services/devops-agent-argocd-repo-server”: dial tcp 10.211.0.1:443: connect: connection refused - error from a previous attempt: http2: server sent GOAWAY and closed the connection; LastStreamID=1961, ErrCode=NO_ERROR, debug=""

2024-11-01T14:27:16.634657666+08:00 helm.go:84: [debug] Get “https://10.211.0.1:443/api/v1/namespaces/argocd/services/devops-agent-argocd-repo-server”: dial tcp 10.211.0.1:443: connect: connection refused - error from a previous attempt: http2: server sent GOAWAY and closed the connection; LastStreamID=1961, ErrCode=NO_ERROR, debug=""

重新运行这个任务报错:

2024-11-01T14:46:41.573531713+08:00 WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: kube.config

2024-11-01T14:46:41.573601428+08:00 WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: kube.config

2024-11-01T14:46:41.598947422+08:00 history.go:56: [debug] getting history for release devops-agent

2024-11-01T14:46:42.350201444+08:00 upgrade.go:155: [debug] preparing upgrade for devops-agent

2024-11-01T14:46:42.681432355+08:00 Error: UPGRADE FAILED: another operation (install/upgrade/rollback) is in progress

2024-11-01T14:46:42.681513867+08:00 helm.go:84: [debug] another operation (install/upgrade/rollback) is in progress

2024-11-01T14:46:42.681539322+08:00 helm.sh/helm/v3/pkg/action.init

2024-11-01T14:46:42.681559948+08:00 helm.sh/helm/v3/pkg/action/action.go:52

2024-11-01T14:46:42.681580350+08:00 runtime.doInit1

2024-11-01T14:46:42.681627968+08:00 runtime/proc.go:6735

2024-11-01T14:46:42.681647336+08:00 runtime.doInit

2024-11-01T14:46:42.681696425+08:00 runtime/proc.go:6702

2024-11-01T14:46:42.681729478+08:00 runtime.main

2024-11-01T14:46:42.681745982+08:00 runtime/proc.go:249

2024-11-01T14:46:42.681760621+08:00 runtime.goexit

2024-11-01T14:46:42.681774903+08:00 runtime/asm_amd64.s:1650

2024-11-01T14:46:42.681788437+08:00 UPGRADE FAILED

2024-11-01T14:46:42.681803309+08:00 main.newUpgradeCmd.func2

2024-11-01T14:46:42.681817144+08:00 helm.sh/helm/v3/cmd/helm/upgrade.go:229

2024-11-01T14:46:42.681830576+08:00 github.com/spf13/cobra.(*Command).execute

2024-11-01T14:46:42.681845488+08:00 github.com/spf13/cobra@v1.8.0/command.go:983

2024-11-01T14:46:42.681862769+08:00 github.com/spf13/cobra.(*Command).ExecuteC

2024-11-01T14:46:42.681880411+08:00 github.com/spf13/cobra@v1.8.0/command.go:1115

2024-11-01T14:46:42.681895093+08:00 github.com/spf13/cobra.(*Command).Execute

2024-11-01T14:46:42.681908660+08:00 github.com/spf13/cobra@v1.8.0/command.go:1039

2024-11-01T14:46:42.681922206+08:00 main.main

2024-11-01T14:46:42.681935909+08:00 helm.sh/helm/v3/cmd/helm/helm.go:83

2024-11-01T14:46:42.681949478+08:00 runtime.main

2024-11-01T14:46:42.681966315+08:00 runtime/proc.go:267

2024-11-01T14:46:42.681984208+08:00 runtime.goexit

2024-11-01T14:46:42.682003073+08:00 runtime/asm_amd64.s:1650

9 天 后

pky 是同样的报错吗 ?看下 devops-jenkins pod 的状态呢;

  • pky 回复了此帖

    yudong 我看了是启动探针和就绪探针不行,我尝试延长探针等待时间,也不行。后来直接在yaml里删除探针,发现启动jenkins容器是没成功的:容器内部的日志到这里就结束了:

    我怎么都记得jenkins启动日志老长了,不是只有这么一点点啊

      pky 如果只有这点日志,jenkins 应该还没有启动;看下是不是 devops-jenkins 资源配置的太少了。

      • pky 回复了此帖

        yudong 命令执行到解压缩war包的时候很慢,长达几分钟,而探针检测机制直接让其重启,也就是说还没来得及对外提供服务就被探针重启了。我的建议是延长探针时间或者修改检测方式。

          15 天 后

          碰到了同样的问题,请问各路大神,如何处理?

          重启 containerd, 重启 kubelet,删除 kube-proxy再生成,试试,我后来好像是这样解决了的。

          pky 什么样的CI场景要解压缩war包?

          • pky 回复了此帖

            klj890

            Picked up JAVA_TOOL_OPTIONS: -XX:InitialRAMPercentage=70 -XX:MaxRAMPercentage=70 -Dhudson.slaves.NodeProvisioner.initialDelay=20 -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85 -Dhudson.model.LoadStatistics.clock=5000 -Dhudson.model.LoadStatistics.decay=0.2 -Dhudson.slaves.NodeProvisioner.recurrencePeriod=5000 -Dhudson.security.csrf.DefaultCrumbIssuer.EXCLUDE_SESSION_ID=true -Dhudson.plugins.git.GitStatus.NOTIFY_COMMIT_ACCESS_CONTROL=disabled -Dio.jenkins.plugins.casc.ConfigurationAsCode.initialDelay=10000 -Djenkins.install.runSetupWizard=false -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+ParallelRefProcEnabled -XX:+DisableExplicitGC -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions

            Running from: /usr/share/jenkins/jenkins.war

            webroot: EnvVars.masterEnvVars.get(“JENKINS_HOME”)

            2024-11-30 09:40:04.451+0000 [id=1] INFO org.eclipse.jetty.util.log.Log#initialized: Logging initialized @1046ms to org.eclipse.jetty.util.log.JavaUtilLog

            2024-11-30 09:40:04.562+0000 [id=1] INFO winstone.Logger#logInternal: Beginning extraction from war file

            2024-11-30 09:49:03.230+0000 [id=1] WARNING o.e.j.s.handler.ContextHandler#setContextPath: Empty contextPath

            2024-11-30 09:49:03.305+0000 [id=1] INFO org.eclipse.jetty.server.Server#doStart: jetty-9.4.45.v20220203; built: 2022-02-03T09:14:34.105Z; git: 4a0c91c0be53805e3fcffdcdcc9587d5301863db; jvm 11.0.16+8

            2024-11-30 09:49:03.703+0000 [id=1] INFO o.e.j.w.StandardDescriptorProcessor#visitServlet: NO JSP Support for /, did not find org.eclipse.jetty.jsp.JettyJspServlet

            2024-11-30 09:49:03.747+0000 [id=1] INFO o.e.j.s.s.DefaultSessionIdManager#doStart: DefaultSessionIdManager workerName=node0

            2024-11-30 09:49:03.748+0000 [id=1] INFO o.e.j.s.s.DefaultSessionIdManager#doStart: No SessionScavenger set, using defaults

            2024-11-30 09:49:03.749+0000 [id=1] INFO o.e.j.server.session.HouseKeeper#startScavenging: node0 Scavenging every 660000ms

            2024-11-30 09:49:04.474+0000 [id=1] INFO hudson.WebAppMain#contextInitialized: Jenkins home directory: /var/jenkins_home found at: EnvVars.masterEnvVars.get(“JENKINS_HOME”)

            2024-11-30 09:49:06.756+0000 [id=1] INFO o.e.j.s.handler.ContextHandler#doStart: Started w.@4acb2510{Jenkins v2.346.3,/,file:///var/jenkins_home/war/,AVAILABLE}{/var/jenkins_home/war}

            2024-11-30 09:49:06.798+0000 [id=1] INFO o.e.j.server.AbstractConnector#doStart: Started ServerConnector@260e86a1{HTTP/1.1, (http/1.1)}{0.0.0.0:8080}

            2024-11-30 09:49:06.799+0000 [id=1] INFO org.eclipse.jetty.server.Server#doStart: Started @543396ms

            2024-11-30 09:49:06.800+0000 [id=23] INFO winstone.Logger#logInternal: Winstone Servlet Engine running: controlPort=disabled

            2024-11-30 09:49:08.801+0000 [id=30] INFO jenkins.InitReactorRunner$1#onAttained: Started initialization

            2024-11-30 09:58:36.286+0000 [id=28] WARNING hudson.ClassicPluginStrategy#createClassJarFromWebInfClasses: Created /var/jenkins_home/plugins/job-dsl/WEB-INF/lib/classes.jar; update plugin to a version created with a newer harness

            2024-11-30 10:19:20.847+0000 [id=29] WARNING hudson.ClassicPluginStrategy#createClassJarFromWebInfClasses: Created /var/jenkins_home/plugins/node-iterator-api/WEB-INF/lib/classes.jar; update plugin to a version created with a newer harness

            2024-11-30 10:19:26.289+0000 [id=29] INFO jenkins.InitReactorRunner$1#onAttained: Listed all plugins

            2024-11-30 10:19:36.544+0000 [id=29] INFO jenkins.InitReactorRunner$1#onAttained: Prepared all plugins

            2024-11-30 10:19:36.565+0000 [id=29] INFO jenkins.InitReactorRunner$1#onAttained: Started all plugins

            WARNING: An illegal reflective access operation has occurred

            WARNING: Illegal reflective access by org.codehaus.groovy.vmplugin.v7.Java7$1 (file:/var/jenkins_home/war/WEB-INF/lib/groovy-all-2.4.21.jar) to constructor java.lang.invoke.MethodHandles$Lookup(java.lang.Class,int)

            WARNING: Please consider reporting this to the maintainers of org.codehaus.groovy.vmplugin.v7.Java7$1

            WARNING: Use –illegal-access=warn to enable warnings of further illegal reflective access operations

            WARNING: All illegal access operations will be denied in a future release

            这是Jenkins启动的某段日志,你可以看到在2024-11-30 09:49:08.801到2024-11-30 09:58:36.286用了几十分钟初始化,这个时间太久了,如果探针时间过段就永远启动不了,所以我干脆删除探针,这也许是我的二手服务器性能拉跨,但是想来其他人的服务器应该也好不到哪里去,毕竟现在的服务器大多数是虚拟机的,我这个是英特尔至强E5 2698 V4的处理器,20核心40线程的。

              7 天 后

              我也遇到了同样的问题,请问有解决的了吗?

              而且我的扩展中心的devops组件一直停在“安装中”的状态,没法继续任何操作了。

              7 天 后
              3 个月 后

              破案了应该是这个原因
              devops-jenkins 这个pod中jvm的参数设置如下
              -XX:InitialRAMPercentage=70 -XX:MaxRAMPercentage=70 -Dhudson.slaves.NodeProvisioner.initialDelay=20 -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85 -Dhudson.model.LoadStatistics.clock=5000 -Dhudson.model.LoadStatistics.decay=0.2 -Dhudson.slaves.NodeProvisioner.recurrencePeriod=5000 -Dhudson.security.csrf.DefaultCrumbIssuer.EXCLUDE_SESSION_ID=true -Dhudson.plugins.git.GitStatus.NOTIFY_COMMIT_ACCESS_CONTROL=disabled -Dio.jenkins.plugins.casc.ConfigurationAsCode.initialDelay=10000 -Djenkins.install.runSetupWizard=false -XX:+AlwaysPreTouch -XX:+HeapDumpOnOutOfMemoryError -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+ParallelRefProcEnabled -XX:+DisableExplicitGC -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions

              就是说启动的时候使用系统上限的70%内存,这个百分比的基数大概率是分配到的节点的最大内存,打比方节点内存32G这时候启动然后就会占用 32*0.7 22.4GB内存,此时pod配置limit是默认的6GB 直接被k8s OOM= =、所以配置12G也会蛋疼
              至于解决方案就是修改这些配置啦