weili520

第一次贴图有误,因为我用的多集群模式,第一次贴的是host集群的参数,所以没打开。
member集群的参数如上图,是全部打开的。
kubectl -n kubesphere-system get cm kubesphere-config -oyaml 结果如下:

    不是改 configmap,要改cc 才能生效的
    kubectl get cc ks-installer -n kubesphere-system -o yaml
    看看你cc文件里改了没

      ruiyaoOps
      改了的,我上面也说过了,我删除了status里的相关部分,重启ks-installer跟ks-apiserver,都没有用。
      [root@kc21m01 ~]# kubectl get cc ks-installer -n kubesphere-system -o yaml
      apiVersion: installer.kubesphere.io/v1alpha1
      kind: ClusterConfiguration
      metadata:
      annotations:
      kubectl.kubernetes.io/last-applied-configuration: |
      {“apiVersion”:“installer.kubesphere.io/v1alpha1”,“kind”:“ClusterConfiguration”,“metadata”:{“annotations”:{},“labels”:{“version”:“v3.2.0”},“name”:“ks-installer”,“namespace”:“kubesphere-system”},“spec”:{“alerting”:{“enabled”:true},“auditing”:{“enabled”:true},“authentication”:{“jwtSecret”:“khUff6ToH2b5EVHTR2a8kPEKWkHpBeBj”},“common”:{“core”:{“console”:{“enableMultiLogin”:true,“port”:30880,“type”:“NodePort”}},“es”:{“basicAuth”:{“enabled”:true,“password”:“esTic456”,“username”:“elastic”},“elkPrefix”:“logstash”,“externalElasticsearchPort”:“9200”,“externalElasticsearchUrl”:“192.168.120.85”,“logMaxAge”:7},“gpu”:{“kinds”:[{“default”:true,“resourceName”:“nvidia.com/gpu”,“resourceType”:“GPU”}]},“minio”:{“volumeSize”:“20Gi”},“monitoring”:{“GPUMonitoring”:{“enabled”:true},“endpoint”:“http://prometheus-operated.kubesphere-monitoring-system.svc:9090”},“openldap”:{“enabled”:true,“volumeSize”:“2Gi”},“redis”:{“enabled”:true,“volumeSize”:“2Gi”}},“devops”:{“enabled”:true,“jenkinsJavaOpts_MaxRAM”:“2g”,“jenkinsJavaOpts_Xms”:“512m”,“jenkinsJavaOpts_Xmx”:“512m”,“jenkinsMemoryLim”:“2Gi”,“jenkinsMemoryReq”:“1500Mi”,“jenkinsVolumeSize”:“8Gi”},“etcd”:{“endpointIps”:“localhost”,“monitoring”:false,“port”:2379,“tlsEnable”:true},“events”:{“enabled”:true},“kubeedge”:{“cloudCore”:{“cloudHub”:{“advertiseAddress”:[""],“nodeLimit”:“100”},“cloudhubHttpsPort”:“10002”,“cloudhubPort”:“10000”,“cloudhubQuicPort”:“10001”,“cloudstreamPort”:“10003”,“nodeSelector”:{“node-role.kubernetes.io/worker”:""},“service”:{“cloudhubHttpsNodePort”:“30002”,“cloudhubNodePort”:“30000”,“cloudhubQuicNodePort”:“30001”,“cloudstreamNodePort”:“30003”,“tunnelNodePort”:“30004”},“tolerations”:[],“tunnelPort”:“10004”},“edgeWatcher”:{“edgeWatcherAgent”:{“nodeSelector”:{“node-role.kubernetes.io/worker”:""},“tolerations”:[]},“nodeSelector”:{“node-role.kubernetes.io/worker”:""},“tolerations”:[]},“enabled”:false},“local_registry”:"",“logging”:{“containerruntime”:“containerd”,“enabled”:true,“logsidecar”:{“enabled”:true,“replicas”:2}},“metrics_server”:{“enabled”:true},“monitoring”:{“gpu”:{“nvidia_dcgm_exporter”:{“enabled”:false}},“storageClass”:“managed-nfs-storage”},“multicluster”:{“clusterRole”:“member”},“network”:{“ippool”:{“type”:“none”},“networkpolicy”:{“enabled”:false},“topology”:{“type”:“none”}},“openpitrix”:{“store”:{“enabled”:false}},“persistence”:{“storageClass”:“managed-nfs-storage”},“servicemesh”:{“enabled”:true}}}
      creationTimestamp: “2021-11-08T02:09:45Z”
      generation: 47
      labels:
      version: v3.2.0
      name: ks-installer
      namespace: kubesphere-system
      resourceVersion: “8400030”
      selfLink: /apis/installer.kubesphere.io/v1alpha1/namespaces/kubesphere-system/clusterconfigurations/ks-installer
      uid: ddf6e2b0-dd0f-4345-9937-4f1b69585da5
      spec:
      alerting:
      enabled: true
      auditing:
      enabled: true
      authentication:
      jwtSecret: khUff6ToH2b5EVHTR2a8adsakPEKWkHpBeBj
      common:
      core:
      console:
      enableMultiLogin: true
      port: 30880
      type: NodePort
      es:
      basicAuth:
      enabled: true
      password: esTic456
      username: elastic
      elkPrefix: logstash
      externalElasticsearchPort: “9200”
      externalElasticsearchUrl: 192.168.120.85
      logMaxAge: 7
      gpu:
      kinds:
      - default: true
      resourceName: nvidia.com/gpu
      resourceType: GPU
      minio:
      volumeSize: 20Gi
      monitoring:
      GPUMonitoring:
      enabled: true
      endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
      openldap:
      enabled: true
      volumeSize: 2Gi
      redis:
      enabled: true
      volumeSize: 2Gi
      devops:
      enabled: true
      jenkinsJavaOpts_MaxRAM: 2g
      jenkinsJavaOpts_Xms: 512m
      jenkinsJavaOpts_Xmx: 512m
      jenkinsMemoryLim: 2Gi
      jenkinsMemoryReq: 1500Mi
      jenkinsVolumeSize: 8Gi
      etcd:
      endpointIps: localhost
      monitoring: false
      port: 2379
      tlsEnable: true
      events:
      enabled: true
      kubeedge:
      cloudCore:
      cloudHub:
      advertiseAddress:
      - ""
      nodeLimit: “100”
      cloudhubHttpsPort: “10002”
      cloudhubPort: “10000”
      cloudhubQuicPort: “10001”
      cloudstreamPort: “10003”
      nodeSelector:
      node-role.kubernetes.io/worker: ""
      service:
      cloudhubHttpsNodePort: “30002”
      cloudhubNodePort: “30000”
      cloudhubQuicNodePort: “30001”
      cloudstreamNodePort: “30003”
      tunnelNodePort: “30004”
      tolerations: []
      tunnelPort: “10004”
      edgeWatcher:
      edgeWatcherAgent:
      nodeSelector:
      node-role.kubernetes.io/worker: ""
      tolerations: []
      nodeSelector:
      node-role.kubernetes.io/worker: ""
      tolerations: []
      enabled: false
      local_registry: ""
      logging:
      containerruntime: docker
      enabled: true
      logsidecar:
      enabled: true
      replicas: 2
      metrics_server:
      enabled: true
      monitoring:
      gpu:
      nvidia_dcgm_exporter:
      enabled: false
      storageClass: managed-nfs-storage
      multicluster:
      clusterRole: member
      network:
      ippool:
      type: none
      networkpolicy:
      enabled: false
      topology:
      type: none
      openpitrix:
      store:
      enabled: false
      persistence:
      storageClass: managed-nfs-storage
      servicemesh:
      enabled: true
      status:
      alerting:
      enabledTime: 2021-11-08T11:32:23CST
      status: enabled
      auditing:
      enabledTime: 2021-11-08T11:29:19CST
      status: enabled
      clusterId: 34b14c4b-2834-463c-8d6a-1b1ca6571013-1636342382
      core:
      enabledTime: 2021-11-08T11:27:17CST
      status: enabled
      version: v3.2.0
      devops:
      enabledTime: 2021-11-08T11:30:48CST
      status: enabled
      events:
      enabledTime: 2021-11-08T11:29:53CST
      status: enabled
      fluentbit:
      enabledTime: 2021-11-08T11:26:15CST
      status: enabled
      logging:
      enabledTime: 2021-11-08T11:30:06CST
      status: enabled
      metricsServer:
      enabledTime: 2021-11-08T11:24:43CST
      status: enabled
      minio:
      enabledTime: 2021-11-08T11:25:57CST
      status: enabled
      monitoring:
      enabledTime: 2021-11-08T11:32:20CST
      status: enabled
      openldap:
      enabledTime: 2021-11-08T11:25:44CST
      status: enabled
      redis:
      enabledTime: 2021-11-08T11:25:35CST
      status: enabled
      servicemesh:
      enabledTime: 2021-11-08T11:30:36CST
      status: enabled

      kubectl rollout restart deploy ks-installer -n kubesphere-system
      把 ks-installer 重启然后看看日志,看下执行过程

      真的很郁闷,同样的环境,3.1.1的工具中是没有问题的,升级到3.2.0各种问题,然后干脆卸载了3.2,重新安装,还是一样的。还有其它的问题,比如开启了auditing,pod "kube-auditing-webhook-deploy-xxxx"中也能看到auditing日志,但是es中就是生成不了index。这些小问题排查了好多天了,头痛,让我不敢轻易上生产。官方文档很简单,就是改几个参数,查都没法查。

        你这个估计得找人远程看看,有买qingcloud 的服务么,在上面提个工单可以找人看看。

        ruiyaoOps
        真的很感谢各位热心帮忙debug!
        我看到论坛有人说filter红框处要改为message,然后重启ks-apiserver,这些我都试过了,没用。

        我是准备把ks作为一个一站式解决方案的,包括日志、审计、istio、devops等,但目前遇到头痛的问题实在太多。

          morriszs 你清除下浏览器缓存,再试试F12打印下这个globals.ksConfig,截个这样的图:

          morriszs 你这个不用修改,因为你目前就是docker环境,而且pod都是正常运行的。这应该是前端显示问题。@weili520 可以帮你看看

            我的集群如下:

            下面的参数也是在登陆host主集群的情况下打印出来的,我想参数应该也是host集群的,跟member集群没关系。

            member集群目前用默认密码登陆不上,这又是另外一个悲伤的问题了:

            后台报错:

            DehaoCheng 不是docker环境啊,我的member集群的k8s版本是:v1.21.4,没有docker,用的containerd。这就是我为什么要升级到3.2.0版本的原因,因为它支持containerd的text日志格式和流水线。

              DehaoCheng
              初始安装时,我是改成了containerd,但ks-installer始终报错,一直安装不上,没办法才改成了默认的docker,这才安装上了。我翻看了官方的github,对这个containerruntime参数没有任何解释,也没提供任何可选参数,默认就是docker。

                DehaoCheng
                改成containerd以后,ks-installer日志没有报错:

                PLAY RECAP *********************************************************************
                localhost : ok=28 changed=17 unreachable=0 failed=0 skipped=11 rescued=0 ignored=0

                Start installing monitoring
                Start installing multicluster
                Start installing openpitrix
                Start installing network
                Start installing alerting
                Start installing auditing
                Start installing devops
                Start installing events
                Start installing logging
                Start installing servicemesh


                Waiting for all tasks to be completed …
                task openpitrix status is successful (⅒)
                task alerting status is successful (2/10)
                task network status is successful (3/10)
                task multicluster status is successful (4/10)
                task auditing status is successful (5/10)
                task events status is successful (6/10)
                task logging status is successful (7/10)
                task servicemesh status is successful (8/10)
                task devops status is successful (9/10)
                task monitoring status is successful (10/10)


                Collecting installation results …
                #####################################################

                Welcome to KubeSphere!

                #####################################################
                但是,fluentbit报错了,跟以前一样: