ruiyaoOps
改了的,我上面也说过了,我删除了status里的相关部分,重启ks-installer跟ks-apiserver,都没有用。
[root@kc21m01 ~]# kubectl get cc ks-installer -n kubesphere-system -o yaml
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{“apiVersion”:“installer.kubesphere.io/v1alpha1”,“kind”:“ClusterConfiguration”,“metadata”:{“annotations”:{},“labels”:{“version”:“v3.2.0”},“name”:“ks-installer”,“namespace”:“kubesphere-system”},“spec”:{“alerting”:{“enabled”:true},“auditing”:{“enabled”:true},“authentication”:{“jwtSecret”:“khUff6ToH2b5EVHTR2a8kPEKWkHpBeBj”},“common”:{“core”:{“console”:{“enableMultiLogin”:true,“port”:30880,“type”:“NodePort”}},“es”:{“basicAuth”:{“enabled”:true,“password”:“esTic456”,“username”:“elastic”},“elkPrefix”:“logstash”,“externalElasticsearchPort”:“9200”,“externalElasticsearchUrl”:“192.168.120.85”,“logMaxAge”:7},“gpu”:{“kinds”:[{“default”:true,“resourceName”:“nvidia.com/gpu”,“resourceType”:“GPU”}]},“minio”:{“volumeSize”:“20Gi”},“monitoring”:{“GPUMonitoring”:{“enabled”:true},“endpoint”:“http://prometheus-operated.kubesphere-monitoring-system.svc:9090”},“openldap”:{“enabled”:true,“volumeSize”:“2Gi”},“redis”:{“enabled”:true,“volumeSize”:“2Gi”}},“devops”:{“enabled”:true,“jenkinsJavaOpts_MaxRAM”:“2g”,“jenkinsJavaOpts_Xms”:“512m”,“jenkinsJavaOpts_Xmx”:“512m”,“jenkinsMemoryLim”:“2Gi”,“jenkinsMemoryReq”:“1500Mi”,“jenkinsVolumeSize”:“8Gi”},“etcd”:{“endpointIps”:“localhost”,“monitoring”:false,“port”:2379,“tlsEnable”:true},“events”:{“enabled”:true},“kubeedge”:{“cloudCore”:{“cloudHub”:{“advertiseAddress”:[""],“nodeLimit”:“100”},“cloudhubHttpsPort”:“10002”,“cloudhubPort”:“10000”,“cloudhubQuicPort”:“10001”,“cloudstreamPort”:“10003”,“nodeSelector”:{“node-role.kubernetes.io/worker”:""},“service”:{“cloudhubHttpsNodePort”:“30002”,“cloudhubNodePort”:“30000”,“cloudhubQuicNodePort”:“30001”,“cloudstreamNodePort”:“30003”,“tunnelNodePort”:“30004”},“tolerations”:[],“tunnelPort”:“10004”},“edgeWatcher”:{“edgeWatcherAgent”:{“nodeSelector”:{“node-role.kubernetes.io/worker”:""},“tolerations”:[]},“nodeSelector”:{“node-role.kubernetes.io/worker”:""},“tolerations”:[]},“enabled”:false},“local_registry”:"",“logging”:{“containerruntime”:“containerd”,“enabled”:true,“logsidecar”:{“enabled”:true,“replicas”:2}},“metrics_server”:{“enabled”:true},“monitoring”:{“gpu”:{“nvidia_dcgm_exporter”:{“enabled”:false}},“storageClass”:“managed-nfs-storage”},“multicluster”:{“clusterRole”:“member”},“network”:{“ippool”:{“type”:“none”},“networkpolicy”:{“enabled”:false},“topology”:{“type”:“none”}},“openpitrix”:{“store”:{“enabled”:false}},“persistence”:{“storageClass”:“managed-nfs-storage”},“servicemesh”:{“enabled”:true}}}
creationTimestamp: “2021-11-08T02:09:45Z”
generation: 47
labels:
version: v3.2.0
name: ks-installer
namespace: kubesphere-system
resourceVersion: “8400030”
selfLink: /apis/installer.kubesphere.io/v1alpha1/namespaces/kubesphere-system/clusterconfigurations/ks-installer
uid: ddf6e2b0-dd0f-4345-9937-4f1b69585da5
spec:
alerting:
enabled: true
auditing:
enabled: true
authentication:
jwtSecret: khUff6ToH2b5EVHTR2a8adsakPEKWkHpBeBj
common:
core:
console:
enableMultiLogin: true
port: 30880
type: NodePort
es:
basicAuth:
enabled: true
password: esTic456
username: elastic
elkPrefix: logstash
externalElasticsearchPort: “9200”
externalElasticsearchUrl: 192.168.120.85
logMaxAge: 7
gpu:
kinds:
- default: true
resourceName: nvidia.com/gpu
resourceType: GPU
minio:
volumeSize: 20Gi
monitoring:
GPUMonitoring:
enabled: true
endpoint: http://prometheus-operated.kubesphere-monitoring-system.svc:9090
openldap:
enabled: true
volumeSize: 2Gi
redis:
enabled: true
volumeSize: 2Gi
devops:
enabled: true
jenkinsJavaOpts_MaxRAM: 2g
jenkinsJavaOpts_Xms: 512m
jenkinsJavaOpts_Xmx: 512m
jenkinsMemoryLim: 2Gi
jenkinsMemoryReq: 1500Mi
jenkinsVolumeSize: 8Gi
etcd:
endpointIps: localhost
monitoring: false
port: 2379
tlsEnable: true
events:
enabled: true
kubeedge:
cloudCore:
cloudHub:
advertiseAddress:
- ""
nodeLimit: “100”
cloudhubHttpsPort: “10002”
cloudhubPort: “10000”
cloudhubQuicPort: “10001”
cloudstreamPort: “10003”
nodeSelector:
node-role.kubernetes.io/worker: ""
service:
cloudhubHttpsNodePort: “30002”
cloudhubNodePort: “30000”
cloudhubQuicNodePort: “30001”
cloudstreamNodePort: “30003”
tunnelNodePort: “30004”
tolerations: []
tunnelPort: “10004”
edgeWatcher:
edgeWatcherAgent:
nodeSelector:
node-role.kubernetes.io/worker: ""
tolerations: []
nodeSelector:
node-role.kubernetes.io/worker: ""
tolerations: []
enabled: false
local_registry: ""
logging:
containerruntime: docker
enabled: true
logsidecar:
enabled: true
replicas: 2
metrics_server:
enabled: true
monitoring:
gpu:
nvidia_dcgm_exporter:
enabled: false
storageClass: managed-nfs-storage
multicluster:
clusterRole: member
network:
ippool:
type: none
networkpolicy:
enabled: false
topology:
type: none
openpitrix:
store:
enabled: false
persistence:
storageClass: managed-nfs-storage
servicemesh:
enabled: true
status:
alerting:
enabledTime: 2021-11-08T11:32:23CST
status: enabled
auditing:
enabledTime: 2021-11-08T11:29:19CST
status: enabled
clusterId: 34b14c4b-2834-463c-8d6a-1b1ca6571013-1636342382
core:
enabledTime: 2021-11-08T11:27:17CST
status: enabled
version: v3.2.0
devops:
enabledTime: 2021-11-08T11:30:48CST
status: enabled
events:
enabledTime: 2021-11-08T11:29:53CST
status: enabled
fluentbit:
enabledTime: 2021-11-08T11:26:15CST
status: enabled
logging:
enabledTime: 2021-11-08T11:30:06CST
status: enabled
metricsServer:
enabledTime: 2021-11-08T11:24:43CST
status: enabled
minio:
enabledTime: 2021-11-08T11:25:57CST
status: enabled
monitoring:
enabledTime: 2021-11-08T11:32:20CST
status: enabled
openldap:
enabledTime: 2021-11-08T11:25:44CST
status: enabled
redis:
enabledTime: 2021-11-08T11:25:35CST
status: enabled
servicemesh:
enabledTime: 2021-11-08T11:30:36CST
status: enabled

kubectl rollout restart deploy ks-installer -n kubesphere-system
把 ks-installer 重启然后看看日志,看下执行过程

真的很郁闷,同样的环境,3.1.1的工具中是没有问题的,升级到3.2.0各种问题,然后干脆卸载了3.2,重新安装,还是一样的。还有其它的问题,比如开启了auditing,pod "kube-auditing-webhook-deploy-xxxx"中也能看到auditing日志,但是es中就是生成不了index。这些小问题排查了好多天了,头痛,让我不敢轻易上生产。官方文档很简单,就是改几个参数,查都没法查。

    你这个估计得找人远程看看,有买qingcloud 的服务么,在上面提个工单可以找人看看。

    ruiyaoOps
    真的很感谢各位热心帮忙debug!
    我看到论坛有人说filter红框处要改为message,然后重启ks-apiserver,这些我都试过了,没用。

    我是准备把ks作为一个一站式解决方案的,包括日志、审计、istio、devops等,但目前遇到头痛的问题实在太多。

      morriszs 你清除下浏览器缓存,再试试F12打印下这个globals.ksConfig,截个这样的图:

      morriszs 你这个不用修改,因为你目前就是docker环境,而且pod都是正常运行的。这应该是前端显示问题。@weili520 可以帮你看看

        我的集群如下:

        下面的参数也是在登陆host主集群的情况下打印出来的,我想参数应该也是host集群的,跟member集群没关系。

        member集群目前用默认密码登陆不上,这又是另外一个悲伤的问题了:

        后台报错:

        DehaoCheng 不是docker环境啊,我的member集群的k8s版本是:v1.21.4,没有docker,用的containerd。这就是我为什么要升级到3.2.0版本的原因,因为它支持containerd的text日志格式和流水线。

          DehaoCheng
          初始安装时,我是改成了containerd,但ks-installer始终报错,一直安装不上,没办法才改成了默认的docker,这才安装上了。我翻看了官方的github,对这个containerruntime参数没有任何解释,也没提供任何可选参数,默认就是docker。

            DehaoCheng
            改成containerd以后,ks-installer日志没有报错:

            PLAY RECAP *********************************************************************
            localhost : ok=28 changed=17 unreachable=0 failed=0 skipped=11 rescued=0 ignored=0

            Start installing monitoring
            Start installing multicluster
            Start installing openpitrix
            Start installing network
            Start installing alerting
            Start installing auditing
            Start installing devops
            Start installing events
            Start installing logging
            Start installing servicemesh


            Waiting for all tasks to be completed …
            task openpitrix status is successful (⅒)
            task alerting status is successful (2/10)
            task network status is successful (3/10)
            task multicluster status is successful (4/10)
            task auditing status is successful (5/10)
            task events status is successful (6/10)
            task logging status is successful (7/10)
            task servicemesh status is successful (8/10)
            task devops status is successful (9/10)
            task monitoring status is successful (10/10)


            Collecting installation results …
            #####################################################

            Welcome to KubeSphere!

            #####################################################
            但是,fluentbit报错了,跟以前一样:

              morriszs 这个错误应该在于containerd的问题,找不到crictl,你可以执行

              kubectl edit fluentbits.logging.kubesphere.io  -n kubesphere-logging-system

              添加containerLogRealPath即可

                DehaoCheng
                我用命令“kubectl edit fluentbits.logging.kubesphere.io -n kubesphere-logging-system”改成了如下图所示:

                之前的crictl错误已经消失,但是很多fluentbit pod报缓存错误:

                我将output配置文件tail相关的文件由默认的5M改为10M,内存错误消失。
                再看fluentbit的日志时,发现出现了auditing日志。检查es index,成功生成。

                审计日志生成问题已解决。
                前端工具箱没有日志查询,日志审计这个问题依然存在。