KubeSphere3.0的Istio版本是1.4.8,Istio从1.5版本开始,把内部组件全部合并成了Istiod,版本变化很大。

下面提供手动热更新升级的方法,升级至1.6.10,该升级方法具有以下特点:

  1. 用户业务不受影响:升级后会安装新旧两个版本(1.4.8及1.6.10),业务Pod使用旧版本Istio不变。如果要使用新版本需要重启业务容器。为了减少业务流量影响,可以不重启;或在访问流量较小时重启。

  2. 数据自动同步:微服务治理/灰度发布策略使用crd保存数据,不受影响,更新至新版本后,策略会自动同步。

升级前步骤

  1. 确认已经开启了Istio组件
# helm -n istio-system list
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
istio           istio-system    1               2020-10-19 15:03:04.554414064 +0800 CST deployed        istio-1.4.8             1.4.8
istio-init      istio-system    2               2020-10-19 15:02:58.123980442 +0800 CST deployed        istio-init-1.4.8        1.4.8
  1. 部署BookInfo应用示例

持续访问服务,引入流量,看到应用治理拓扑图确保正常,并灰度发布一个版本,以便对比升级后数据正常:

下载安装包

curl -L https://istio.io/downloadIstio | ISTIO_VERSION=1.6.10 sh -

禁用 istio-galley

在安装新版本之前需要先禁用galley,否则新版本API无法检验通过,导致无法安装:

# 在command后面加入参数 - --enable-validation=false
kubectl edit deployment -n istio-system istio-galley

确认galley已经禁止

# kubectl get validatingwebhookconfigurations.admissionregistration.k8s.io  istio-galley
Error from server (NotFound): validatingwebhookconfigurations.admissionregistration.k8s.io "istio-galley" not found

安装

设置新版本1-6-10,如Istiod-1-6-10

./istio-1.6.10/bin/istioctl install --set hub=istio --set tag=1.6.10 --set addonComponents.prometheus.enabled=false --set values.global.jwtPolicy=first-party-jwt --set values.global.proxy.autoInject=disabled --set values.global.tracer.zipkin.address="jaeger-collector.istio-system.svc:9411" --set values.sidecarInjectorWebhook.enableNamespacesByDefault=true --set values.global.imagePullPolicy=IfNotPresent --set values.global.controlPlaneSecurityEnabled=false --set revision=1-6-10

等待安装成功

Detected that your cluster does not support third party JWT authentication. Falling back to less secure first party JWT. See https://istio.io/docs/ops/best-practices/security/#configure-third-party-service-account-tokens for details.
✔ Istio core installed
✔ Istiod installed
- Pruning removed resources                                                                                                                                                                    2020-10-10T08:36:44.032602Z     warn    installer       retrieving resources to prune type security.istio.io/v1beta1, Kind=PeerAuthentication: peerauthentications.security.istio.io is forbidden: User "system:serviceaccount:kubesphere-system:ks-installer" cannot list resource "peerauthentications" in API group "security.istio.io" at the cluster scope not found
✔ Installation complete

当提示 Installation complete时表明已经安装成功。

确认安装成功

上述命令执行完后,会存在两个版本的Istio,即Istio-1.4.8及 Istio-1.6.10

# kubectl -n istio-system get po
NAME                                       READY   STATUS      RESTARTS   AGE
istio-citadel-7f676f76d7-qtsnf             1/1     Running     0          21m
istio-galley-7b5ffd58fd-wmtsd              1/1     Running     0          8m21s
istio-ingressgateway-8569f8dcb-j8x4l       1/1     Running     0          21m
istio-init-crd-10-1.4.8-lkc8s              0/1     Completed   0          21m
istio-init-crd-11-1.4.8-6qdq9              0/1     Completed   0          21m
istio-init-crd-12-1.4.8-b5nvf              0/1     Completed   0          21m
istio-init-crd-14-1.4.8-wpvbj              0/1     Completed   0          21m
istio-pilot-67fd55d974-xdxhw               2/2     Running     0          21m
istio-policy-668894cffc-hkc9x              2/2     Running     0          21m
istio-sidecar-injector-9c4d79658-dtkrt     1/1     Running     0          21m
istio-telemetry-57fc886bf8-rcswr           2/2     Running     0          21m
istiod-1-6-10-7db56f875b-hwwtk             1/1     Running     0          8m1s
jaeger-collector-76bf54b467-zfz9v          1/1     Running     16         41d
jaeger-es-index-cleaner-1603036500-zmdpt   0/1     Completed   0          15h
jaeger-operator-7559f9d455-gl4tj           1/1     Running     2          41d
jaeger-query-b478c5655-gjh7s               2/2     Running     17         41d

istiod-1-6-10-7db56f875b-hwwtk 即为新版本

查看Injector

# kubectl get mutatingwebhookconfigurations istio-sidecar-injector-1-6-10
NAME                            CREATED AT
istio-sidecar-injector-1-6-10   2020-10-19T07:16:11Z

# kubectl get mutatingwebhookconfigurations istio-sidecar-injector
NAME                     CREATED AT
istio-sidecar-injector   2020-10-19T07:03:05Z

可以看到存在两个Injector,可以并存,默认使用新版本的Injector。

更新Prometheus 配置文件

流量治理拓扑图本质是通过Promtheus的数据来绘制流量图,需要更新Prometheus配置文件

新版本的Promtheus配置文件,同时兼容1.4.8及1.6.10两个版本

删除老的配置文件

kubectl -n kubesphere-monitoring-system delete secret additional-scrape-configs

更新prometheus配置文件

curl -O https://raw.githubusercontent.com/zackzhangkai/ks-installer/master/roles/ks-istio/files/prometheus/prometheus-additional.yaml

kubectl -n kubesphere-monitoring-system create secret generic additional-scrape-configs --from-file=prometheus-additional.yaml

暴露Prometheus服务,在线查看配置是否生效:

# kubectl -n kubesphere-monitoring-system get svc prometheus-k8s
NAME             TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
prometheus-k8s   NodePort   10.233.51.22   <none>        9090:31034/TCP   70d

浏览器访问 http://IP:NodePort

检查envoy_stats是否正常,新版本的会此项配置,确认正常。

检查istio_requests_total 是否正常:

页面流量拓扑执行的PromQl命令为:

sum(rate(istio_requests_total{reporter="source",source_workload_namespace!="test",source_workload!="unknown",destination_service_namespace="test"} [60s])) by (source_workload_namespace,source_workload,source_app,source_version,destination_service_namespace,destination_service_name,destination_workload,destination_app,destination_version,request_protocol,response_code)

可以看到有数据,此时说明正常。

数据平面升级

业务目前还是使用的是Istio-1.4.8:

# kubectl -n test get po -oyaml | grep 'image: '
    - image: istio/proxyv2:1.4.8
    - image: kubesphere/examples-bookinfo-reviews-v2:1.13.0
      image: istio/proxyv2:1.4.8
      ....

此时使用的是旧版本的Istio,由于业务没有重启,也不会影响业务。

新注册的服务,会使用新版本。

如果业务需要使用新版本的Istio,需要重启业务Pod,如这里重启productpage-v1:

kubectl -n test rollout restart deploy  productpage-v1

如果启用了项目服务治理Tracing的功能,同样需要将kubesphere-controls-system下对应的Ingress控制器重启:

kubectl -n kubesphere-controls-system rollout restart deploy kubesphere-router-test 

检查此时使用了新版本的Istio:

# kubectl -n test get po productpage-v1-77bbd8f85d-8njf8 -oyaml | grep 'image: '
  - image: kubesphere/examples-bookinfo-productpage-v1:1.13.0
    image: istio/proxyv2:1.6.10
    image: istio/proxyv2:1.6.10
    image: istio/proxyv2:1.6.10
    image: kubesphere/examples-bookinfo-productpage-v1:1.13.0
    image: istio/proxyv2:1.6.10

最后,检查应用服务治理及灰度发布均正常,在开启Tracing时,流量追踪同样正常,至此全部完成。

注意:重启服务时会有短暂的服务中断,请选择在业务访问较小时选择重启。

注意

升级完成后,建议找一个受影响小的服务,升级下sidecar

kubectl rollout restart deploy xxx -n test 

然后检查下该服务是否正常,如果有问题,把istiod-1.6.10临时禁用。禁用方法:

  1. 备份 istiod-injector
kubectl get mutatingwebhookconfigurations.admissionregistration istio-sidecar-injector-1-6-10 -oyaml > istio-sidecar-injector-1-6-10.yaml
  1. 删除istiod-injector
kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io istio-sidecar-injector-1-6-10

此时,新加的服务还是会使用老版本的1.4.8 。

  1. 重新启用istio-1.6.10

重新启用1.6.10可以直接把istiod-injector还原,在apply该yaml时需要修改个配置(也可以直接apply下,看下相应的报错):

sed -i 's/sideEffects: Unknown/sideEffects: None/' istio-sidecar-injector-1-6-10.yaml

此时apply就没有问题:

kubectl apply -f istio-sidecar-injector-1-6-10.yaml

最后,如果有问题,别忘了同步到社区 ^-^

    好,非常好。最好能补充下升级到1.6测试没问题后怎么卸载旧版的1.4

      370569218

      helm -n istio-system uninstall istio-init
      helm -n istio-system uninstall istio
      kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io istio-sidecar-injector
      2 个月 后
      13 天 后

      KS啥时候官方更新Istio至比较新的版本?

      ks3.1会更新至 istio-1.6.10,目前我们的开发版本已经支持,你可以试下: kubespheredev/ks-installer:latest,更换到这个镜像后,会直接升级。操作命令:

      kubectl -n kubesphere-system patch cc ks-installer --type merge --patch '{"status":{"servicemesh":{"status":"none"}}}' 
      
      kubectl -n kubesphere-system set image deployment/ks-installer installer=kubespheredev/ks-installer
      
       kubectl -n kubesphere-system rollout restart deploy/ks-installer

        zackzhang 你好,我用kubespheredev/ks-installer:latest镜像这个部署,ks-core核心组件不能正常启动,多集群组件也异常,单独启动pod也有问题。看起来是istio的准入策略导致的,请问还需要额外配置才能正常部署吗?

          weekyuan 可以提供下更详细的信息吗?

          1. 具体是哪些pod不正常;相关日志;describe该pod,查看对应事件及失败原因。
          2. 对应环境:多集群 & 单集群,有没有其他操作。

            zackzhang 这个我之前是配置文件直接多节点部署发现的,后来我通过ALLinOne方式然后add node就没问题问题,很顺利起来了。

            另外新镜像中通过ldap认证登陆提示这个错误,麻烦看下!
            页面提示:Internal Server Error
            request to http://ks-apiserver.kubesphere-system.svc/kapis/iam.kubesphere.io/v1alpha2/users failed, reason: socket hang up

            如下是ks-apiserver的错误:
            `
            2020/12/24 00:02:07 http: panic serving 192.168.17.21:51934: assignment to entry in nil map
            goroutine 4032 [running]:
            net/http.(conn).serve.func1(0xc0005d0320)
            /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/server.go:1795 +0×139
            panic(0×2d6b7e0, 0×38941e0)
            /opt/hostedtoolcache/go/1.13.15/x64/src/runtime/panic.go:679 +0×1b2
            kubesphere.io/kubesphere/pkg/apiserver/auditing.(
            auditing).LogRequestObject(0xc000f4af90, 0xc000e00200, 0xc000aca000, 0×2b15c20)
            /home/runner/work/kubesphere/kubesphere/pkg/apiserver/auditing/types.go:187 +0×89a
            kubesphere.io/kubesphere/pkg/apiserver/filters.WithAuditing.func1(0×393b1a0, 0xc000bdc2a0, 0xc000e00200)
            /home/runner/work/kubesphere/kubesphere/pkg/apiserver/filters/auditing.go:51 +0×100
            net/http.HandlerFunc.ServeHTTP(0xc000f4afc0, 0×393b1a0, 0xc000bdc2a0, 0xc000e00200)
            /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/server.go:2036 +0×44
            kubesphere.io/kubesphere/pkg/apiserver/filters.WithAuthorization.func1(0×393b1a0, 0xc000bdc2a0, 0xc000e00200)
            /home/runner/work/kubesphere/kubesphere/pkg/apiserver/filters/authorization.go:50 +0×37c
            net/http.HandlerFunc.ServeHTTP(0xc0000c9740, 0×393b1a0, 0xc000bdc2a0, 0xc000e00200)
            /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/server.go:2036 +0×44
            kubesphere.io/kubesphere/pkg/apiserver/filters.WithMultipleClusterDispatcher.func1(0×393b1a0, 0xc000bdc2a0, 0xc000e00200)
            /home/runner/work/kubesphere/kubesphere/pkg/apiserver/filters/dispatch.go:43 +0xd9
            net/http.HandlerFunc.ServeHTTP(0xc000f4b830, 0×393b1a0, 0xc000bdc2a0, 0xc000e00200)
            /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/server.go:2036 +0×44
            kubesphere.io/kubesphere/pkg/apiserver/filters.WithAuthentication.func1(0×393b1a0, 0xc000bdc2a0, 0xc000e00200)
            /home/runner/work/kubesphere/kubesphere/pkg/apiserver/filters/authentication.go:68 +0×5ce
            net/http.HandlerFunc.ServeHTTP(0xc0000c9800, 0×393b1a0, 0xc000bdc2a0, 0xc00221f200)
            /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/server.go:2036 +0×44
            kubesphere.io/kubesphere/pkg/apiserver/filters.WithRequestInfo.func1(0×393b1a0, 0xc000bdc2a0, 0xc00221e000)
            /home/runner/work/kubesphere/kubesphere/pkg/apiserver/filters/requestinfo.go:67 +0×3c5
            net/http.HandlerFunc.ServeHTTP(0xc000f7a000, 0×393b1a0, 0xc000bdc2a0, 0xc00221e000)
            /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/server.go:2036 +0×44
            net/http.serverHandler.ServeHTTP(0xc0009fc0e0, 0×393b1a0, 0xc000bdc2a0, 0xc00221e000)
            /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/server.go:2831 +0xa4
            net/http.(conn).serve(0xc0005d0320, 0×394da20, 0xc002a16480)
            /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/server.go:1919 +0×875
            created by net/http.(
            Server).Serve
            /opt/hostedtoolcache/go/1.13.15/x64/src/net/http/server.go:2957 +0×384
            2020/12/24 00:02:07 http: panic serving 192.168.17.21:51942: assignment to entry in nil map

            `

              weekyuan 这个的原因上是http server在bind IP+端口失败,你检查下这个192.168.17.21:51934 是不是可用,端口是不是被占用

                zackzhang 感谢回复,上面的错误日志是ks-apiserver的,涉及到的192.168.17.21:51934这个IP居然是console的pod地址,console中只有一个8000端口被监听,没有51934端口监听。按理说ks-apiserver需要bind地址不应该bind到console的IP才对。

                另外附上console的错误日志,根据console日志发现login登陆成功,在confirm时候调用ks-apiserver时socket hang up:
                <-- POST /login 2020/12/24T11:15:43.055
                --> POST /login 302 88ms 59b 2020/12/24T11:15:43.143
                <-- GET /login/confirm 2020/12/24T11:15:43.169
                --> GET /login/confirm 200 2ms 11.97kb 2020/12/24T11:15:43.171
                <-- GET /login/confirm 2020/12/24T11:15:43.209
                --> GET /login/confirm 200 3ms 11.97kb 2020/12/24T11:15:43.212
                <-- GET /kapis/iam.kubesphere.io/v1alpha2/users/yuanyuan 2020/12/24T11:15:46.305
                --> GET /kapis/iam.kubesphere.io/v1alpha2/users/yuanyuan 200 6ms 15b 2020/12/24T11:15:46.311
                <-- GET / 2020/12/24T11:15:48.538
                UnauthorizedError: Not Login
                at Object.throw (/opt/kubesphere/console/server/server.js:23953:11)
                at getCurrentUser (/opt/kubesphere/console/server/server.js:7995:14)
                at renderView (/opt/kubesphere/console/server/server.js:93770:7)
                at dispatch (/opt/kubesphere/console/server/server.js:5198:32)
                at next (/opt/kubesphere/console/server/server.js:5199:18)
                at /opt/kubesphere/console/server/server.js:64227:16
                at dispatch (/opt/kubesphere/console/server/server.js:5198:32)
                at next (/opt/kubesphere/console/server/server.js:5199:18)
                at /opt/kubesphere/console/server/server.js:72222:37
                at dispatch (/opt/kubesphere/console/server/server.js:5198:32)
                at next (/opt/kubesphere/console/server/server.js:5199:18)
                at /opt/kubesphere/console/server/server.js:64227:16
                at dispatch (/opt/kubesphere/console/server/server.js:5198:32)
                at next (/opt/kubesphere/console/server/server.js:5199:18)
                at /opt/kubesphere/console/server/server.js:72222:37
                at dispatch (/opt/kubesphere/console/server/server.js:5198:32)
                --> GET / 302 2ms 43b 2020/12/24T11:15:48.540
                <-- GET /login 2020/12/24T11:15:48.541
                --> GET /login 200 9ms 11.92kb 2020/12/24T11:15:48.549
                <-- GET /kapis/iam.kubesphere.io/v1alpha2/users?email=yuanyuan-g%40360.cn 2020/12/24T11:15:49.171
                <-- GET /kapis/iam.kubesphere.io/v1alpha2/users/yuanyuan 2020/12/24T11:15:49.173
                --> GET /kapis/iam.kubesphere.io/v1alpha2/users?email=yuanyuan-g%40360.cn 200 7ms 15b 2020/12/24T11:15:49.178
                --> GET /kapis/iam.kubesphere.io/v1alpha2/users/yuanyuan 200 6ms 15b 2020/12/24T11:15:49.179
                <-- POST /login/confirm 2020/12/24T11:15:49.192
                FetchError: request to http://ks-apiserver.kubesphere-system.svc/kapis/iam.kubesphere.io/v1alpha2/users failed, reason: socket hang up
                at ClientRequest.<anonymous> (/opt/kubesphere/console/server/server.js:74611:11)
                at ClientRequest.emit (events.js:314:20)
                at Socket.socketOnEnd (_http_client.js:458:9)
                at Socket.emit (events.js:326:22)
                at endReadableNT (_stream_readable.js:1241:12)
                at processTicksAndRejections (internal/process/task_queues.js:84:21) {
                type: 'system',
                errno: 'ECONNRESET',
                code: 'ECONNRESET'
                }
                --> POST /login/confirm 500 5ms 193b 2020/12/24T11:15:49.197

                  13 天 后

                  weekyuan

                  把ks-installer ks-apiserver 的镜像的拉取策略改为always,然后

                  kubectl -n kubesphere-system patch cc ks-installer --type merge --patch '{"status":{"servicemesh":{"status":"none"}}}' 
                  kubectl -n kubesphere-system rollout restart deploy/ks-installer
                  kubectl -n kubesphere-system rollout restart deploy/ks-apiserver
                  24 天 后

                  @zackzhang 你好 我按你说的将ks-installer镜像改为kubespheredev/ks-installer:latest,ks-installer ks-apiserver 的镜像的拉取策略改为always后 执行了你所说的命令 ,现在ks-installer的pods启动不了,一直CrashLoopBackOff状态 也没有日志

                    dylan 刚验证了这个镜像没有问题。

                    1. kubectl describe deploy/ks-installer -n kubesphere-system看下有没有提示信息

                    2. 检查kubelet日志,journalctl -xe看下是不是有错

                    3. 节点健康状况

                    dylan 感谢 我卸载了kubesphere 直接将yaml中的镜像替换了,再安装是可以 ,我晚点用3.0安装在线替换试试

                    @zackzhang 我上面那个错误应该是直接复制你文档中命令导致的,如图应该是ks-installer

                    还有一个问题请教,我如果用了image: kubespheredev/ks-installer:latest那其他组件例如ks-apiserver还需要改为开发版的镜像吗?还是继续用原来的正式版