• 告警通知
  • kubesphere 4.1.2 飞书告警中 labels 过少,有办法自定义吗?

如下为飞书告警接收到的原始数据,labels 中没有 namespace 等,而且 startsAt 时间也不对,这是由哪里控制的呢,有办法调试吗?

【原始告警数据】map[NotifySuccessful:false annotations:map[alerttime:2025-01-22 16:12:03.506997924 +0800 CST description:Ingress kubernetes-dashboard 的 99% 请求延迟超过了 1 秒 summary:Ingress kubernetes-dashboard 请求延迟过高] endsAt:0001-01-01T00:00:00Z id:7276856115694654218 labels:map[alertname:NginxIngressHighLatency alerttype:metric cluster:host ingress:kubernetes-dashboard receiver:global-webhook-receiver rule_group:ingress-nginx rule_id:e490e64e-28f0-47db-9809-f7020824386a rule_level:cluster rule_type:custom severity:info] startsAt:0001-01-01T00:00:00Z status:firing]

这是你自己的告警规则触发的吗

    NullFox

    是自定义的,也是通过 servicemonitor 添加的监控,不过内置的告警,我看startsAt 的时间也不对

    apiVersion: alerting.kubesphere.io/v2beta1
    kind: ClusterRuleGroup
    metadata:
      annotations:
      labels:
    #    alerting.kubesphere.io/builtin: "true"
        alerting.kubesphere.io/data_source: default
        alerting.kubesphere.io/enable: "true"
        alerting.kubesphere.io/owner_cluster: host
      name: host.ingress-nginx
    spec:
      rules:
      - alert: NginxIngressHigh5xxRate
        id: 9i0j1k2l-3m4n-5o6p-7q8r-9s0t1u2v3w4x
        expr: |
          sum(rate(nginx_ingress_controller_requests{status=~"5.*"}[5m])) by (ingress) / sum(rate(nginx_ingress_controller_requests[5m])) by (ingress) > 0.05
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Ingress {{ $labels.ingress }} 的 5xx 错误率过高"
          description: "Ingress {{ $labels.ingress }} 的 5xx 错误率超过了 5%"
    
      - alert: NginxIngressHighLatency
        id: 2l3m4n5o-6p7q-8r9s-0t1u-2v3w4x5y6z7a
        expr: |
          histogram_quantile(0.99, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) by (le, ingress)) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Ingress {{ $labels.ingress }} 请求延迟过高"
          description: "Ingress {{ $labels.ingress }} 的 99% 请求延迟超过了 1 秒"
    
      - alert: NginxIngressHigh4xxRate
        id: 1k2l3m4n-5o6p-7q8r-9s0t-1u2v3w4x5y6z
        expr: |
          (sum(rate(nginx_ingress_controller_requests{status=~"4.*"}[5m])) by (ingress) / sum(rate(nginx_ingress_controller_requests[5m])) by (ingress) > 0.10)
          and
          (sum(rate(nginx_ingress_controller_requests{status=~"4.*"}[5m])) by (ingress) / sum(rate(nginx_ingress_controller_requests[5m])) by (ingress) < 1)
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Ingress {{ $labels.ingress }} 的 4xx 错误率过高"
          description: "Ingress {{ $labels.ingress }} 的 4xx 错误率超过了 10%"
    
      - alert: NginxIngressPodRestarting
        id: 7g8h9i0j-1k2l-3m4n-5o6p-7q8r9s0t1u2v
        expr: |
          increase(kube_pod_container_status_restarts_total{namespace="ingress-nginx", container="controller"}[10m]) > 4
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Nginx Ingress 控制器 Pod 频繁重启"
          description: "ingress-nginx 命名空间中的 Nginx Ingress 控制器 Pod 在过去 2 分钟内重启次数超过 4 次"
    
      - alert: NginxIngressDown
        id: 6f7g8h9i-0j1k-2l3m-4n5o-6p7q8r9s0t1u
        expr: |
          absent(up{job="nginx-ingress"}) == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Nginx Ingress 控制器已停止运行"
          description: "Nginx Ingress 控制器已停止运行超过 1 分钟"
    
      - alert: NginxIngressHighUV
        id: 4d5e6f7g-8h9i-0j1k-2l3m-4n5o6p7q8r9s
        expr: |
          sum(rate(nginx_ingress_controller_requests[1m])) by (ingress) > 20
        for: 1m
        labels:
          severity: warning
        annotations:
          summary: "Ingress {{ $labels.ingress }} 用户请求量过高"
          description: "Ingress {{ $labels.ingress }} 的每分钟请求数超过了 20 次"