• 已编辑

创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。

KubeSphere版本信息

v4.1.2。离线安装。在已有K8s上安装。

问题是什么
host集群alertmanager

kind: ConfigMap
apiVersion: v1
metadata:
  name: whizard-notification-alertmanager
  namespace: kubesphere-monitoring-system
  labels:
    app.kubernetes.io/instance: whizard-notification
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/version: v0.27.0
    helm.sh/chart: alertmanager-1.11.3
    kubesphere.io/extension-ref: whizard-notification
  annotations:
    meta.helm.sh/release-name: whizard-notification
    meta.helm.sh/release-namespace: kubesphere-monitoring-system
data:
  alertmanager.yml: |
    global: {}
    inhibit_rules:
    - equal:
      - cluster
      - namespace
      - alertname
      source_matchers:
      - severity = "critical"
      target_matchers:
      - severity =~ "warning|info"
    - equal:
      - cluster
      - namespace
      - alertname
      source_matchers:
      - severity = "warning"
      target_matchers:
      - severity = "info"
    - equal:
      - cluster
      - namespace
      source_matchers:
      - alertname = "InfoInhibitor"
      target_matchers:
      - severity = "info"
    receivers:
    - name: Default
    - name: "null"
    - name: Watchdog
    - name: prometheus
      webhook_configs:
      - url: http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts
    - name: event
      webhook_configs:
      - send_resolved: false
        url: http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts
    - name: auditing
      webhook_configs:
      - send_resolved: false
        url: http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts
    - name: mcp-receiver
      webhook_configs:
      - send_resolved: true
        url: https://alarm-uat.demo.com/api/alerts/prometheus/callback
    route:
      group_by:
      - cluster
      - namespace
      - alertname
      - rule_id
      group_interval: 5m
      group_wait: 30s
      receiver: Default
      repeat_interval: 12h
      routes:
      - matchers:
        - alertname = "Watchdog"
        receiver: Watchdog
      - matchers:
        - alertname = "InfoInhibitor"
        receiver: "null"
      - group_interval: 30s
        matchers:
        - alerttype = "event"
        receiver: event
      - group_interval: 30s
        matchers:
        - alerttype = "auditing"
        receiver: auditing
      - matchers:
        - alerttype =~ ".*"
        receiver: prometheus
        continue: true
      - matchers:
        - alerttype =~ ".*"
        receiver: mcp-receiver
    templates:
    - /etc/alertmanager/*.tmpl

告警组规则:仅将告警等级改变,KubeDeploymentReplicasMismatch的持续时间改为5min

自定义规则组

demo:

创建一个nginx-deployment,image设置为不存在版本。创建时间16:28

告警日志

2025-03-21 16:45:12.544  INFO 1 --- [nio-8082-exec-5] c.j.a.m.s.s.controller.AlertsController  : method: [prometheusCallBack], args: ["{\"receiver\":\"mcp-receiver\",\"status\":\"firing\",\"alerts\":[{\"status\":\"firing\",\"labels\":{\"alertname\":\"KubePodNotReady\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"namespace\":\"trip\",\"pod\":\"nginx-deployment-5f7f7777dc-nfxs9\",\"rule_group\":\"kubernetes-apps\",\"rule_id\":\"0c524b2b-39b3-4eb3-992f-f0959f9286d7\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"warning\"},\"annotations\":{\"description\":\"Pod trip/nginx-deployment-5f7f7777dc-nfxs9 has been in a non-ready state for longer than 15 minutes.\",\"runbook_url\":\"https://alert-runbooks.kubesphere.io/latest/runbooks/kubernetes/kubepodnotready\",\"summary\":\"Pod has been in a non-ready state for more than 15 minutes.\"},\"startsAt\":\"2025-03-21T08:44:42.375Z\",\"endsAt\":\"0001-01-01T00:00:00Z\",\"generatorURL\":\"\",\"fingerprint\":\"9ed5e2a90a3fce0b\"}],\"groupLabels\":{\"alertname\":\"KubePodNotReady\",\"cluster\":\"wh-member\",\"namespace\":\"trip\",\"rule_id\":\"0c524b2b-39b3-4eb3-992f-f0959f9286d7\"},\"commonLabels\":{\"alertname\":\"KubePodNotReady\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"namespace\":\"trip\",\"pod\":\"nginx-deployment-5f7f7777dc-nfxs9\",\"rule_group\":\"kubernetes-apps\",\"rule_id\":\"0c524b2b-39b3-4eb3-992f-f0959f9286d7\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"warning\"},\"commonAnnotations\":{\"description\":\"Pod trip/nginx-deployment-5f7f7777dc-nfxs9 has been in a non-ready state for longer than 15 minutes.\",\"runbook_url\":\"https://alert-runbooks.kubesphere.io/latest/runbooks/kubernetes/kubepodnotready\",\"summary\":\"Pod has been in a non-ready state for more than 15 minutes.\"},\"externalURL\":\"http://whizard-notification-alertmanager-0:9093\",\"version\":\"4\",\"groupKey\":\"{}/{alerttype=~\\\".*\\\"}:{alertname=\\\"KubePodNotReady\\\", cluster=\\\"wh-member\\\", namespace=\\\"trip\\\", rule_id=\\\"0c524b2b-39b3-4eb3-992f-f0959f9286d7\\\"}\",\"truncatedAlerts\":0}\n"]


2025-03-21 16:46:34.611  INFO 1 --- [nio-8082-exec-9] c.j.a.m.s.s.controller.AlertsController  : method: [prometheusCallBack], args: ["{\"receiver\":\"mcp-receiver\",\"status\":\"firing\",\"alerts\":[{\"status\":\"firing\",\"labels\":{\"alertname\":\"ImagePullFailed\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"container\":\"nginx\",\"instance\":\"172.16.175.67:8080\",\"job\":\"kube-state-metrics\",\"namespace\":\"trip\",\"pod\":\"nginx-deployment-5f7f7777dc-nfxs9\",\"reason\":\"ImagePullBackOff\",\"rule_group\":\"jiajiayue-k8s\",\"rule_id\":\"9656ad8c-8cd6-4990-8adf-7734a76d4ead\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"info\",\"uid\":\"6daea5ab-149b-4fb2-8b51-2b86c5f7178c\"},\"annotations\":{\"message\":\"Pod nginx-deployment-5f7f7777dc-nfxs9 in Namespace trip is failing to pull image, Reason: ImagePullBackOff.\",\"summary\":\"Image Pull Failed in Namespace trip\"},\"startsAt\":\"2025-03-21T08:31:04.51Z\",\"endsAt\":\"0001-01-01T00:00:00Z\",\"generatorURL\":\"\",\"fingerprint\":\"1077b323f8ccec86\"}],\"groupLabels\":{\"alertname\":\"ImagePullFailed\",\"cluster\":\"wh-member\",\"namespace\":\"trip\",\"rule_id\":\"9656ad8c-8cd6-4990-8adf-7734a76d4ead\"},\"commonLabels\":{\"alertname\":\"ImagePullFailed\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"container\":\"nginx\",\"instance\":\"172.16.175.67:8080\",\"job\":\"kube-state-metrics\",\"namespace\":\"trip\",\"pod\":\"nginx-deployment-5f7f7777dc-nfxs9\",\"reason\":\"ImagePullBackOff\",\"rule_group\":\"jiajiayue-k8s\",\"rule_id\":\"9656ad8c-8cd6-4990-8adf-7734a76d4ead\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"info\",\"uid\":\"6daea5ab-149b-4fb2-8b51-2b86c5f7178c\"},\"commonAnnotations\":{\"message\":\"Pod nginx-deployment-5f7f7777dc-nfxs9 in Namespace trip is failing to pull image, Reason: ImagePullBackOff.\",\"summary\":\"Image Pull Failed in Namespace trip\"},\"externalURL\":\"http://whizard-notification-alertmanager-0:9093\",\"version\":\"4\",\"groupKey\":\"{}/{alerttype=~\\\".*\\\"}:{alertname=\\\"ImagePullFailed\\\", cluster=\\\"wh-member\\\", namespace=\\\"trip\\\", rule_id=\\\"9656ad8c-8cd6-4990-8adf-7734a76d4ead\\\"}\",\"truncatedAlerts\":0}\n"]


2025-03-21 16:49:12.481  INFO 1 --- [nio-8082-exec-3] c.j.a.m.s.s.controller.AlertsController  : method: [prometheusCallBack], args: ["{\"receiver\":\"mcp-receiver\",\"status\":\"firing\",\"alerts\":[{\"status\":\"firing\",\"labels\":{\"alertname\":\"KubeDeploymentReplicasMismatch\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"deployment\":\"nginx-deployment\",\"instance\":\"172.16.175.67:8080\",\"job\":\"kube-state-metrics\",\"namespace\":\"trip\",\"rule_group\":\"kubernetes-apps\",\"rule_id\":\"f2b17680-2a26-45ba-a88b-d0a7393ca8c0\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"info\"},\"annotations\":{\"description\":\"Deployment trip/nginx-deployment has not matched the expected number of replicas for longer than 5 minutes.\",\"runbook_url\":\"https://alert-runbooks.kubesphere.io/latest/runbooks/kubernetes/kubedeploymentreplicasmismatch\",\"summary\":\"Deployment has not matched the expected number of replicas.\"},\"startsAt\":\"2025-03-21T08:43:42.375Z\",\"endsAt\":\"0001-01-01T00:00:00Z\",\"generatorURL\":\"\",\"fingerprint\":\"afcb34f93dc5624f\"}],\"groupLabels\":{\"alertname\":\"KubeDeploymentReplicasMismatch\",\"cluster\":\"wh-member\",\"namespace\":\"trip\",\"rule_id\":\"f2b17680-2a26-45ba-a88b-d0a7393ca8c0\"},\"commonLabels\":{\"alertname\":\"KubeDeploymentReplicasMismatch\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"deployment\":\"nginx-deployment\",\"instance\":\"172.16.175.67:8080\",\"job\":\"kube-state-metrics\",\"namespace\":\"trip\",\"rule_group\":\"kubernetes-apps\",\"rule_id\":\"f2b17680-2a26-45ba-a88b-d0a7393ca8c0\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"info\"},\"commonAnnotations\":{\"description\":\"Deployment trip/nginx-deployment has not matched the expected number of replicas for longer than 5 minutes.\",\"runbook_url\":\"https://alert-runbooks.kubesphere.io/latest/runbooks/kubernetes/kubedeploymentreplicasmismatch\",\"summary\":\"Deployment has not matched the expected number of replicas.\"},\"externalURL\":\"http://whizard-notification-alertmanager-0:9093\",\"version\":\"4\",\"groupKey\":\"{}/{alerttype=~\\\".*\\\"}:{alertname=\\\"KubeDeploymentReplicasMismatch\\\", cluster=\\\"wh-member\\\", namespace=\\\"trip\\\", rule_id=\\\"f2b17680-2a26-45ba-a88b-d0a7393ca8c0\\\"}\",\"truncatedAlerts\":0}\n"]

这里能看到

KubePodNotReady告警startsAt:2025-03-21T08:44:42.375Z。发出告警时间: 2025-03-21 16:45:12.544;

16:28-16:44(15分钟验证),16:45发出(这里用了1分钟多),正常

ImagePullBackOff告警startsAt:2025-03-21T08:31:04.51Z。发出告警时间:2025-03-21 16:46:34.611;

16:28-16:31(3分钟验证),16:46发出(这里用了15分钟多),不正常了

KubeDeploymentReplicasMismatch告警startsAt:2025-03-21T08:43:42.375Z 发出告警时间:2025-03-21 16:49:12.481;

16:28-16:43(10分钟查询5分钟验证),16:49发出(这里用了7分钟),算上group_interval等配置正常

请大佬帮忙看看,为啥ImagePullBackOff晚告警出来这么久