tzghost secret里的数据是需要用base64加密的, 你把apisecret用base64加密一下

    tzghost notification manager接收的是alert manager的数据,所以数据的风格和alert manager保持一致,
    你可以在https://github.com/kubesphere/notification-manager提个issue,把你想要的数据格式提出来,大家一起讨论

      wanjunlei 好的,内容格式类似邮箱告警中包含节点,监控项,异常值和时间这些就可以,可读性高一些。这个需求有参考解决方案吗?

      你可以把微信的格式换成邮件的
      执行
      kubectl edit notificationmanagers.notification.kubesphere.io -n kubesphere-monitoring-system notification-manager
      加上这个

      spec:
        receivers:
          options:
            wechat:
              template: {{ template "email.default.html" . }}

      然后等notification manager重启

        Alertmanager 发出的告警包含的信息会更丰富些,不仅仅是之前老的监控告警邮件通知里的节点,监控项,异常值和时间这些,因为不仅仅能对节点告警,还能对 pod, deployment, daemonset, statefulset 等工作负载,还有容器的异常状况, 还包括系统关键组件apiserver etcd 等;告警的类型有 Prometheus 发出的告警,这个通常有异常值和阈值,有kube-events 发出的告警,你截图的就是event告警,没有阈值,只有异常描述信息;

        你收到的告警应该说标题乱了一点,Alertmanager 风格的告警会把 label 的value写在标题的括号里,有点乱。其他应该还算清晰,内容没有中文可能让你觉得有点乱,我们会优化下

          wanjunlei benjaminhuo 感谢两位回复解答,发现目前的监控比较简单,我们计划是自定义一些更贴近业务的监控项,要怎么把3.0自带的prometheus暴露到外部访问呢,控制台上不能配置外网网关

          • Jeff 回复了此帖

            tzghost 你需要编辑下面的 svc, 不是 operator 的 svc

            kubectl -n kubesphere-monitoring-system get svc prometheus-k8s
            NAME             TYPE       CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
            prometheus-k8s   NodePort   10.233.9.200   <none>        9090:31193/TCP   19d
              6 天 后

              benjaminhuo alertmanager-main的配置不能修改吗?发现修改replicas为1或者编辑配置模板等操作,配置会还原。另外alertmanager-main里的alertmanager.yaml加密了,调整这个文件的配置,要怎么处理?

              给prometheus-k8s-rulefiles-0添加配置项也被还原了

              11 天 后

              这些都要编辑 crd ,不能直接编辑工作负载或者configmap:

              # 调整 Alertmanager replica 
              kubectl -n kubesphere-monitoring-system edit alertmanagers.monitoring.coreos.com main
              # 调整 Alertmanager 配置 , 需要把内容拷贝出来 base64 解码,改完后再base64编码写进去
              kubectl -n kubesphere-monitoring-system edit secrets alertmanager-main
              # 修改 rule 也要改crd
              kubectl -n kubesphere-monitoring-system edit prometheusrules.monitoring.coreos.com prometheus-k8s-rules
                5 天 后

                调整后正常了,但遇到了新的问题,现在一直报ContainerBackoff,但这个POD我已经删除重建了,还是一直有这个告警,这是什么问题?benjaminhuo
                =====监控报警=====
                级别:warning
                名称:ContainerBackoff
                信息:Back-off restarting failed container
                容器: notification-manager-operator
                POD: notification-manager-operator-6958786cd6-qmck2
                命名空间:kubesphere-monitoring-system
                告警时间:2020-10-15 15:49:50
                =======end========

                wanjunlei notification-manager-operator-6958786cd6-qmck2这个POD我已经删除了,但告警还是一直有
                [root@bg-003-kvm004-vms003 ~]# kubectl -n kubesphere-monitoring-system get pods
                NAME READY STATUS RESTARTS AGE
                alertmanager-main-0 2/2 Running 0 16d
                alertmanager-main-1 2/2 Running 0 16d
                alertmanager-main-2 2/2 Running 0 16d
                kube-state-metrics-95c974544-5bnm5 3/3 Running 0 37d
                node-exporter-6n5ld 2/2 Running 0 37d
                node-exporter-8vs2v 2/2 Running 0 37d
                node-exporter-kjsp5 2/2 Running 0 37d
                node-exporter-m6ql6 2/2 Running 0 4d21h
                node-exporter-x7bmr 2/2 Running 0 37d
                node-exporter-x8wpd 2/2 Running 0 37d
                notification-manager-deployment-7c8df68d94-f4g97 1/1 Running 0 37d
                notification-manager-deployment-7c8df68d94-qb49z 1/1 Running 0 37d
                notification-manager-operator-6958786cd6-djsbt 2/2 Running 0 157m
                prometheus-k8s-0 3/3 Running 1 94m
                prometheus-k8s-1 3/3 Running 1 94m
                prometheus-operator-84d58bf775-269pk 2/2 Running 0 37d

                =====监控报警=====
                级别:warning
                名称:ContainerBackoff
                信息:Back-off restarting failed container
                容器: notification-manager-operator
                POD: notification-manager-operator-6958786cd6-qmck2
                命名空间:kubesphere-monitoring-system
                告警时间:2020-10-15 16:56:21
                =======end========

                  xulai
                  `metadata 7 item
                  name:

                  notification-manager-operator-6958786cd6-qmck2.163921a5a979e695
                  namespace:

                  kubesphere-monitoring-system
                  selfLink:

                  /api/v1/namespaces/kubesphere-monitoring-system/events/notification-manager-operator-6958786cd6-qmck2.163921a5a979e695
                  uid:

                  827e396e-8c57-4232-9b48-d94e12e5d12d
                  resourceVersion:

                  12336096
                  creationTimestamp:

                  2020-10-14T11:35:18Z
                  managedFields 0 item
                  involvedObject 7 item
                  kind:

                  Pod
                  namespace:

                  kubesphere-monitoring-system
                  name:

                  notification-manager-operator-6958786cd6-qmck2
                  uid:

                  fb6064d4-67b1-4c05-9a7a-264a1d649ccf
                  apiVersion:

                  v1
                  resourceVersion:

                  4665
                  fieldPath:

                  spec.containers{notification-manager-operator}
                  reason:

                  BackOff
                  message:

                  Back-off restarting failed container
                  source 2 item
                  component:

                  kubelet
                  host:

                  bg-003-kvm006-vms004
                  firstTimestamp:

                  2020-09-29T02:55:37Z
                  lastTimestamp:

                  2020-10-14T11:38:23Z
                  count:

                  28
                  type:

                  Warning
                  eventTime:

                  null
                  reportingComponent:

                  reportingInstance:

                  logStripANSI:

                  undefined
                  `

                    tzghost
                    看下alertmanager有没有notification-manager-operator-6958786cd6-qmck2的告警:
                    curl alertmanager-main.kubesphere-monitoring-system.svc:9093/api/v2/alerts?filter=pod=notification-manager-operator-6958786cd6-qmck2查看,或者暴露出kubesphere-monitoring-system/alertmanager-main服务的9093端口,访问端口对应页面查看