xiaoyao wanjunlei 有报错: kubectl logs notification-manager-deployment-6dc77b644c-khcn8 -n kubesphere-monitoring-system level=info ts=2021-03-09T10:57:21.348653452+08:00 caller=main.go:100 msg=“Starting notification manager…” addr=:19093 timeout=5s level=info ts=2021-03-09T10:57:23.967666571+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=global-wechat-receiver namespace=kubesphere-monitoring-system level=info ts=2021-03-09T10:57:24.067891365+08:00 caller=config.go:340 msg=“Setting up informers successfully” level=info ts=2021-03-09T10:57:24.068183364+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=default-wechat-config namespace=kubesphere-monitoring-system level=info ts=2021-03-09T10:57:24.075098906+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=global-wechat-receiver namespace=kubesphere-monitoring-system level=info ts=2021-03-09T10:57:24.08361082+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=default-wechat-config namespace=kubesphere-monitoring-system level=info ts=2021-03-09T20:04:39.399555541+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=global-wechat-receiver namespace=kubesphere-monitoring-system level=info ts=2021-03-09T20:04:39.407563203+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=default-wechat-config namespace=kubesphere-monitoring-system level=info ts=2021-03-10T05:11:55.027145065+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=global-wechat-receiver namespace=kubesphere-monitoring-system level=info ts=2021-03-10T05:11:55.031688763+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=default-wechat-config namespace=kubesphere-monitoring-system level=info ts=2021-03-10T05:29:28.087800947+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=default-wechat-config namespace=kubesphere-monitoring-system level=info ts=2021-03-10T06:06:49.711918397+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=global-wechat-receiver namespace=kubesphere-monitoring-system level=info ts=2021-03-10T14:19:10.655853976+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=global-wechat-receiver namespace=kubesphere-monitoring-system level=info ts=2021-03-10T14:19:10.662390219+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=default-wechat-config namespace=kubesphere-monitoring-system level=info ts=2021-03-10T14:45:30.14750318+08:00 caller=config.go:679 msg=“resource change” op=add type=wechat name=default-wechat-config namespace=kubesphere-monitoring-system level=error ts=2021-03-10T14:45:32.498190702+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:46:15.105251675+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:47:01.31948205+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:47:11.258597024+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:47:58.639380057+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:48:10.664693303+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:48:53.573766358+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:49:15.363254158+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:49:34.043604744+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:51:36.778389869+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:51:41.3490185+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:52:00.043378967+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:52:12.198599381+08:00 caller=handler.go:201 msg=EOF level=error ts=2021-03-10T14:52:55.708103897+08:00 caller=handler.go:201 msg=EOF
xiaoyao wanjunlei altermanager配置信息也加入了“notification-manager” name: notification-manager webhook_configs: send_resolved: true http_config: {} url: http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts max_alerts: 0 templates: []
wanjunlei kubectl edit notificationmanagers.notification.kubesphere.io -n kubesphere-monitoring-system notification-manager 添加 spec: args: - --log.level=debug 把日志级别设为debug,如果有发送email通知会有日志打印的。 没收到通知就几种可能 1.alertmanager没发,看看alertmanager有没有告警,看看alertmanager的日志,启动时会打印配置的webhook,看看是不是没有配置成功,或者发送时发送失败 2.notification manager发送失败,这个看notification-manager-deploy的日志 3.企业微信的接口调用有次数限制,超过之后接口调用不会返回错误,也会收不到通知 4.企业微信的配置不正确,这个你自己检查一下 你先把这些排查一下
xiaoyao wanjunlei 老师,我昨晚回去翻看您配置微信报警的视频,排查结果如下: 1、将报警级别调整到debug,但测试创建cm还没有收到微信信息; 2、altermanager的log如下:(看日志都是正常的) 3、notification-manager-deploy的日志如下: 4、我新建的企业微信应用,没调用几次; 5、企业微信配置,我参考您视频中全局配置: apiVersion: notification.kubesphere.io/v1alpha1 kind: WechatConfig metadata: name: default-wechat-config namespace: kubesphere-monitoring-system labels: app: notification-manager type: default spec: wechatApiUrl: https://qyapi.weixin.qq.com/cgi-bin/ wechatApiSecret: key: wechat name: default-wechat-secret wechatApiCorpId: ww69a126a1fe458507 wechatApiAgentId: "1000144" --- apiVersion: notification.kubesphere.io/v1alpha1 kind: WechatReceiver metadata: name: global-wechat-receiver namespace: kubesphere-monitoring-system labels: app: notification-manager type: global spec: # wechatConfigSelector needn't to be configured for a global receiver # optional # One of toUser, toParty, toParty should be specified. toUser: ww69a126a1fe458507 #toParty: global #toTag: global --- apiVersion: v1 data: wechat: M2RQMmxREnpTQ2VOSmp5VTJWLWFlTTh6bUltWER6OFhaOHJFSG8yNV9ycw== kind: Secret metadata: labels: app: notification-manager name: default-wechat-secret namespace: kubesphere-monitoring-system type: Opaque
xiaoyao wanjunlei 老师,下面这个配置: "receivers": - "name": "notification-manager" "webhook_configs": - "url": "http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts" 是在 alertmanager-main 这个secret文件中配置的吗?
wanjunlei xiaoyao 创建cm是不能产生告警的你需要启用auditing,然后kubectl edit awh kube-auditing-webhook 修改alertingPriority为INFO,然后通过ui创建cm这样才能产生告警
xiaoyao wanjunlei "global": "resolve_timeout": "5m" "inhibit_rules": - "equal": - "namespace" - "alertname" "source_match": "severity": "critical" "target_match_re": "severity": "warning|info" - "equal": - "namespace" - "alertname" "source_match": "severity": "warning" "target_match_re": "severity": "info" "receivers": - "name": "Default" - "name": "Watchdog" - "name": "Critical" - "name": "prometheus" "webhook_configs": - "url": "http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts" - "name": "event" "webhook_configs": - "url": "http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts" - "name": "auditing" "webhook_configs": - "send_resolved": false "url": "http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts" - "name": "notification-manager" "webhook_configs": - "url": "http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts" "route": "group_by": - "namespace" - "alertname" "group_interval": "5m" "group_wait": "30s" "receiver": "Default" "repeat_interval": "12h" "routes": - "match": "alertname": "Watchdog" "receiver": "Watchdog" - "match": "severity": "critical" "receiver": "Critical" - "match": "alerttype": "" "receiver": "prometheus" - "group_interval": "30s" "match": "alerttype": "event" "receiver": "event" - "group_interval": "30s" "match": "alerttype": "auditing" "receiver": "auditing" - "group_interval": "30s" "match": "alerttype": "notification-manager" "receiver": "notification-manager"
wanjunlei 你调用下http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/receivers,这会返回你的receiver,看看receiver到底有没有设置成功
xiaoyao 这个是我的nm配置文件: apiVersion: notification.kubesphere.io/v1alpha1 kind: NotificationManager metadata: name: notification-manager namespace: kubesphere-monitoring-system spec: replicas: 1 resources: limits: cpu: 500m memory: 1Gi requests: cpu: 100m memory: 20Mi image: kubesphere/notification-manager:v0.6.0 imagePullPolicy: IfNotPresent serviceAccountName: notification-manager-sa portName: webhook defaultConfigSelector: matchLabels: type: default receivers: tenantKey: user globalReceiverSelector: matchLabels: type: global tenantReceiverSelector: matchLabels: type: tenant options: global: templateFile: - /etc/notification-manager/template email: notificationTimeout: 5 deliveryType: bulk slack: notificationTimeout: 5 wechat: notificationTimeout: 5 webhook: notificationTimeout: 5 dingtalk: notificationTimeout: 5 volumeMounts: - mountPath: /etc/notification-manager/ name: noification-manager-template volumes: - configMap: defaultMode: 420 name: noification-manager-template name: noification-manager-template