hongming 好的,谢谢
过渡无压力!KubeSphere v3.4.x 到 v4.x 平滑升级全攻略
- 已编辑
I0421 13:14:14.563237 1 filepath.go:71] [Storage] LocalFileStorage File directory /tmp/ks-upgrade already exists
I0421 13:14:14.563378 1 executor.go:158] [Job] kubeedge is disabled
I0421 13:14:14.563390 1 executor.go:158] [Job] kubefed is disabled
I0421 13:14:14.563394 1 executor.go:158] [Job] servicemesh is disabled
I0421 13:14:14.563396 1 executor.go:158] [Job] storage-utils is disabled
I0421 13:14:14.563397 1 executor.go:158] [Job] tower is disabled
I0421 13:14:14.563399 1 executor.go:158] [Job] whizard-telemetry is disabled
I0421 13:14:14.563401 1 executor.go:158] [Job] whizard-alerting is disabled
I0421 13:14:14.563409 1 executor.go:155] [Job] application is enabled, priority 100
I0421 13:14:14.563413 1 executor.go:155] [Job] devops is enabled, priority 800
I0421 13:14:14.563421 1 executor.go:155] [Job] iam is enabled, priority 999
I0421 13:14:14.563428 1 executor.go:158] [Job] whizard-logging is disabled
I0421 13:14:14.563431 1 executor.go:158] [Job] metrics-server is disabled
I0421 13:14:14.563435 1 executor.go:155] [Job] network is enabled, priority 100
I0421 13:14:14.563443 1 executor.go:155] [Job] core is enabled, priority 10000
I0421 13:14:14.563446 1 executor.go:158] [Job] whizard-events is disabled
I0421 13:14:14.563457 1 executor.go:155] [Job] gateway is enabled, priority 90
I0421 13:14:14.563460 1 executor.go:158] [Job] opensearch is disabled
I0421 13:14:14.563462 1 executor.go:158] [Job] vector is disabled
I0421 13:14:14.563464 1 executor.go:158] [Job] whizard-monitoring is disabled
I0421 13:14:14.563466 1 executor.go:158] [Job] whizard-notification is disabled
I0421 13:14:14.568650 1 helm.go:145] getting history for release [ks-core]
I0421 13:14:14.633176 1 validator.go:57] [Validator] Current release's version is v3.3.2
I0421 13:14:14.633200 1 executor.go:220] [Job] core prepare-upgrade start
I0421 13:14:14.633209 1 executor.go:58] [Job] Detected that the plugin core is true
I0421 13:14:14.658523 1 core.go:314] scale down deployment kubesphere-system/ks-apiserver unchanged
I0421 13:14:14.668097 1 core.go:314] scale down deployment kubesphere-system/ks-console unchanged
I0421 13:14:14.680227 1 core.go:314] scale down deployment kubesphere-system/ks-controller-manager unchanged
I0421 13:14:14.690029 1 core.go:314] scale down deployment kubesphere-system/ks-installer unchanged
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x2025ccf]
goroutine 1 [running]:
kubesphere.io/ks-upgrade/pkg/jobs/core.(\*upgradeJob).deleteKubeSphereWebhook(0xc0009def00, {0x2ba2f40, 0x40e7c00})
/workspace/pkg/jobs/core/core.go:429 +0x22f
kubesphere.io/ks-upgrade/pkg/jobs/core.(\*upgradeJob).PrepareUpgrade(0xc0009def00, {0x2ba2f40, 0x40e7c00})
/workspace/pkg/jobs/core/core.go:118 +0xcc
kubesphere.io/ks-upgrade/pkg/executor.(\*Executor).PrepareUpgrade(0xc000403560, {0x2ba2f40, 0x40e7c00})
/workspace/pkg/executor/executor.go:227 +0x275
main.init.func5(0xc000158600?, {0x26e5683?, 0x4?, 0x26e5687?})
/workspace/cmd/ks-upgrade.go:102 +0x26
github.com/spf13/cobra.(\*Command).execute(0x4095a80, {0xc00081ce40, 0x3, 0x3})
/workspace/vendor/github.com/spf13/cobra/command.go:985 +0xaaa
github.com/spf13/cobra.(\*Command).ExecuteC(0x4094c20)
/workspace/vendor/github.com/spf13/cobra/command.go:1117 +0x3ff
github.com/spf13/cobra.(\*Command).Execute(...)
/workspace/vendor/github.com/spf13/cobra/command.go:1041
main.main()
/workspace/cmd/ks-upgrade.go:136 +0x4e
helm 更新到了3.17之后,执行报错
hongmingK零SK壹S
- 已编辑
kubectl get validatingwebhookconfiguration -o json | jq '.items[] | .webhooks[] | select(.clientConfig.service == null)'
这么检查看看,可以先把这些 validatingwebhookconfiguration clientConfig 中 service url 修改为 ServiceReference
已提交修复,kubesphere/ks-upgrade#27
我这里是使用kubeconfig 文件连接的集群,url 中是集群名称,改为 ServiceReference 的时候, url": “https://gatekeeper-webhook-service.XXXtke集群.svc.cluster.local:18443/v1/admit” 这种格式,怎么改。
hongmingK零SK壹S
- 已编辑
改一下 ks-upgrade 的 imagePullPolicy,重新拉一下镜像
编辑 ks-core-values.yaml,把 pullPolicy 改为 Always
upgrade:
enabled: true
image:
registry: ""
repository: kubesphere/ks-upgrade
tag: ""
pullPolicy: Always
hongming 更新完毕之后,需要重新执行啥命令来加载一下呢,或者重新更新呢
- 已编辑
重新执行更新,还是报错, 我的电脑是mac M1 系列,跟客户是啥系统没关系吧,看这个报错,还是空指针,哪里不对
kubectl logs -f -n kubesphere-system prepare-upgrade-rnxs8
I0422 05:07:44.886017 1 filepath.go:71] [Storage] LocalFileStorage File directory /tmp/ks-upgrade already exists
I0422 05:07:44.886187 1 executor.go:158] [Job] whizard-alerting is disabled
I0422 05:07:44.886201 1 executor.go:158] [Job] whizard-logging is disabled
I0422 05:07:44.886205 1 executor.go:158] [Job] whizard-notification is disabled
I0422 05:07:44.886209 1 executor.go:158] [Job] tower is disabled
I0422 05:07:44.886212 1 executor.go:158] [Job] whizard-telemetry is disabled
I0422 05:07:44.886216 1 executor.go:158] [Job] whizard-events is disabled
I0422 05:07:44.886220 1 executor.go:158] [Job] kubefed is disabled
I0422 05:07:44.886224 1 executor.go:158] [Job] servicemesh is disabled
I0422 05:07:44.886227 1 executor.go:158] [Job] storage-utils is disabled
I0422 05:07:44.886240 1 executor.go:155] [Job] devops is enabled, priority 800
I0422 05:07:44.886261 1 executor.go:155] [Job] iam is enabled, priority 999
I0422 05:07:44.886272 1 executor.go:158] [Job] metrics-server is disabled
I0422 05:07:44.886276 1 executor.go:158] [Job] opensearch is disabled
I0422 05:07:44.886279 1 executor.go:158] [Job] whizard-monitoring is disabled
I0422 05:07:44.886287 1 executor.go:155] [Job] network is enabled, priority 100
I0422 05:07:44.886295 1 executor.go:158] [Job] vector is disabled
I0422 05:07:44.886304 1 executor.go:155] [Job] application is enabled, priority 100
I0422 05:07:44.886311 1 executor.go:155] [Job] core is enabled, priority 10000
I0422 05:07:44.886323 1 executor.go:155] [Job] gateway is enabled, priority 90
I0422 05:07:44.886327 1 executor.go:158] [Job] kubeedge is disabled
I0422 05:07:44.898462 1 helm.go:145] getting history for release [ks-core]
I0422 05:07:44.951846 1 validator.go:57] [Validator] Current release's version is v3.3.2
I0422 05:07:44.951869 1 executor.go:220] [Job] core prepare-upgrade start
I0422 05:07:44.951878 1 executor.go:58] [Job] Detected that the plugin core is true
I0422 05:07:44.977148 1 core.go:314] scale down deployment kubesphere-system/ks-apiserver unchanged
I0422 05:07:45.000332 1 core.go:314] scale down deployment kubesphere-system/ks-console unchanged
I0422 05:07:45.006025 1 core.go:314] scale down deployment kubesphere-system/ks-controller-manager unchanged
I0422 05:07:45.028711 1 core.go:314] scale down deployment kubesphere-system/ks-installer unchanged
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x2025ccf]
goroutine 1 [running]:
kubesphere.io/ks-upgrade/pkg/jobs/core.(\*upgradeJob).deleteKubeSphereWebhook(0xc000a2f630, {0x2ba2f40, 0x40e7c00})
/workspace/pkg/jobs/core/core.go:429 +0x22f
kubesphere.io/ks-upgrade/pkg/jobs/core.(\*upgradeJob).PrepareUpgrade(0xc000a2f630, {0x2ba2f40, 0x40e7c00})
/workspace/pkg/jobs/core/core.go:118 +0xcc
kubesphere.io/ks-upgrade/pkg/executor.(\*Executor).PrepareUpgrade(0xc000491710, {0x2ba2f40, 0x40e7c00})
/workspace/pkg/executor/executor.go:227 +0x275
main.init.func5(0xc00021a800?, {0x26e5683?, 0x4?, 0x26e5687?})
/workspace/cmd/ks-upgrade.go:102 +0x26
github.com/spf13/cobra.(\*Command).execute(0x4095a80, {0xc000898420, 0x3, 0x3})
/workspace/vendor/github.com/spf13/cobra/command.go:985 +0xaaa
github.com/spf13/cobra.(\*Command).ExecuteC(0x4094c20)
/workspace/vendor/github.com/spf13/cobra/command.go:1117 +0x3ff
github.com/spf13/cobra.(\*Command).Execute(...)
/workspace/vendor/github.com/spf13/cobra/command.go:1041
main.main()
/workspace/cmd/ks-upgrade.go:136 +0x4e
脚本就卡在这里了,
deployment.apps/ks-installer scaled
etcd endpointIps is empty or localhost, will be filled with
clusterconfiguration.installer.kubesphere.io/ks-installer patched (no change)
remove redis
No resources found
No resources found
No resources found
No resources found
No resources found
No resources found
apply CRDs
job.batch “prepare-upgrade” deleted
configmap/ks-upgrade-prepare-config unchanged
job.batch/prepare-upgrade created
查看资源运行情况
hongmingK零SK壹S
检查一下 prepare-upgrade-xxxx
pod 的 imagePullPolicy,镜像更新了没
image: docker.io/kubesphere/ks-upgrade:v4.1.3
imageID: docker.io/kubesphere/ks-upgrade@sha256:bbdc80bbcab3f87b020af43d177c28425af593e103133ec6defdab9488dfb3a3
hongming 更新了,还是这个问题,有点崩溃
hongming 还有其他可以解决的么,升级之路太难了
hongmingK零SK壹S
- 已编辑
确认一下 image hash 是否一致,我这能复现你遇到的问题,修复后也重新验证过, 相关问题的修复记录 https://github.com/kubesphere/ks-upgrade/pull/27/files#diff-a539a85bdf77e6a269d56942c9ce3f56b0e1a0d33cf1ccd03ba6081f582a17dbR429
这一行的空指针加了前置判断,环境中配置的差异会被忽略
/workspace/pkg/jobs/core/core.go:429 +0x22f
请提供一下最新的日志信息,可能不是这里产生的错误了
hongmingK零SK壹S
只能尽可能减少环境的差异,有一些特殊配置在升级过程中可能没有被覆盖到,但能根据异常日志定位到错误 https://github.com/kubesphere/ks-upgrade,比如你遇到的这个空指针问题,就可以稍微调整一下 validatingwebhook 的配置跳过
镜像是一致的呢,我只是打的tag 传到的仓库而已,因为mac M1 系列的本地是拉取不到arm的镜像,所以需要传到我仓库去
hongmingK零SK壹S
是不是 retag 的时候搞错了,改了 imagePullPolicy也没拉到更新后的镜像,日志中的错误信息已经足够定位问题,你也可以自己去构建镜像试试。如果觉得麻烦,可以这么跳过,把导致异常的 validatingwebhook 先备份删除,升级完成后再恢复
hongming tag 不应该的,毕竟后面的imageID 都是一致的, 把异常的validatingwebhook 备份删除,这个操作有点麻烦,咋跳过,咋跳过呢
hongmingK零SK壹S
这个 imageID 看起来也是一致的,也是对的,但这是 x86 的,kubesphere/ks-upgrade 并没有提供 arm 镜像,你看看 prepare-upgrade 的 pod 实际拉取的是哪个呢,还有就是异常日志有没有变化,排除其它问题导致中断
hongming 我看了日志报错,还是那个nil 要不然就需要删除 这些,但是我觉得还是跳过比较多,我尝试使用github 上制作镜像试试
kubectl get validatingwebhookconfigurations | grep kubesphere
cluster.kubesphere.io 1 19h
network.kubesphere.io 1 19h
resourcesquotas.quota.kubesphere.io 1 19h
rulegroups.alerting.kubesphere.io 3 19h
storageclass-accessor.storage.kubesphere.io 1 19h
users.iam.kubesphere.io 1 19h
xingxing122 我拉取了master 分支,打的镜像还是有问题,这个问题修复是在那个分支搞的
hongmingK零SK壹S
- 已编辑