- 已编辑
创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。
KubeSphere版本信息
v4.1.2。离线安装。在已有K8s上安装。
问题是什么
host集群alertmanager
kind: ConfigMap
apiVersion: v1
metadata:
name: whizard-notification-alertmanager
namespace: kubesphere-monitoring-system
labels:
app.kubernetes.io/instance: whizard-notification
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: alertmanager
app.kubernetes.io/version: v0.27.0
helm.sh/chart: alertmanager-1.11.3
kubesphere.io/extension-ref: whizard-notification
annotations:
meta.helm.sh/release-name: whizard-notification
meta.helm.sh/release-namespace: kubesphere-monitoring-system
data:
alertmanager.yml: |
global: {}
inhibit_rules:
- equal:
- cluster
- namespace
- alertname
source_matchers:
- severity = "critical"
target_matchers:
- severity =~ "warning|info"
- equal:
- cluster
- namespace
- alertname
source_matchers:
- severity = "warning"
target_matchers:
- severity = "info"
- equal:
- cluster
- namespace
source_matchers:
- alertname = "InfoInhibitor"
target_matchers:
- severity = "info"
receivers:
- name: Default
- name: "null"
- name: Watchdog
- name: prometheus
webhook_configs:
- url: http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts
- name: event
webhook_configs:
- send_resolved: false
url: http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts
- name: auditing
webhook_configs:
- send_resolved: false
url: http://notification-manager-svc.kubesphere-monitoring-system.svc:19093/api/v2/alerts
- name: mcp-receiver
webhook_configs:
- send_resolved: true
url: https://alarm-uat.demo.com/api/alerts/prometheus/callback
route:
group_by:
- cluster
- namespace
- alertname
- rule_id
group_interval: 5m
group_wait: 30s
receiver: Default
repeat_interval: 12h
routes:
- matchers:
- alertname = "Watchdog"
receiver: Watchdog
- matchers:
- alertname = "InfoInhibitor"
receiver: "null"
- group_interval: 30s
matchers:
- alerttype = "event"
receiver: event
- group_interval: 30s
matchers:
- alerttype = "auditing"
receiver: auditing
- matchers:
- alerttype =~ ".*"
receiver: prometheus
continue: true
- matchers:
- alerttype =~ ".*"
receiver: mcp-receiver
templates:
- /etc/alertmanager/*.tmpl
告警组规则:仅将告警等级改变,KubeDeploymentReplicasMismatch的持续时间改为5min
自定义规则组
demo:
创建一个nginx-deployment,image设置为不存在版本。创建时间16:28
告警日志
2025-03-21 16:45:12.544 INFO 1 --- [nio-8082-exec-5] c.j.a.m.s.s.controller.AlertsController : method: [prometheusCallBack], args: ["{\"receiver\":\"mcp-receiver\",\"status\":\"firing\",\"alerts\":[{\"status\":\"firing\",\"labels\":{\"alertname\":\"KubePodNotReady\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"namespace\":\"trip\",\"pod\":\"nginx-deployment-5f7f7777dc-nfxs9\",\"rule_group\":\"kubernetes-apps\",\"rule_id\":\"0c524b2b-39b3-4eb3-992f-f0959f9286d7\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"warning\"},\"annotations\":{\"description\":\"Pod trip/nginx-deployment-5f7f7777dc-nfxs9 has been in a non-ready state for longer than 15 minutes.\",\"runbook_url\":\"https://alert-runbooks.kubesphere.io/latest/runbooks/kubernetes/kubepodnotready\",\"summary\":\"Pod has been in a non-ready state for more than 15 minutes.\"},\"startsAt\":\"2025-03-21T08:44:42.375Z\",\"endsAt\":\"0001-01-01T00:00:00Z\",\"generatorURL\":\"\",\"fingerprint\":\"9ed5e2a90a3fce0b\"}],\"groupLabels\":{\"alertname\":\"KubePodNotReady\",\"cluster\":\"wh-member\",\"namespace\":\"trip\",\"rule_id\":\"0c524b2b-39b3-4eb3-992f-f0959f9286d7\"},\"commonLabels\":{\"alertname\":\"KubePodNotReady\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"namespace\":\"trip\",\"pod\":\"nginx-deployment-5f7f7777dc-nfxs9\",\"rule_group\":\"kubernetes-apps\",\"rule_id\":\"0c524b2b-39b3-4eb3-992f-f0959f9286d7\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"warning\"},\"commonAnnotations\":{\"description\":\"Pod trip/nginx-deployment-5f7f7777dc-nfxs9 has been in a non-ready state for longer than 15 minutes.\",\"runbook_url\":\"https://alert-runbooks.kubesphere.io/latest/runbooks/kubernetes/kubepodnotready\",\"summary\":\"Pod has been in a non-ready state for more than 15 minutes.\"},\"externalURL\":\"http://whizard-notification-alertmanager-0:9093\",\"version\":\"4\",\"groupKey\":\"{}/{alerttype=~\\\".*\\\"}:{alertname=\\\"KubePodNotReady\\\", cluster=\\\"wh-member\\\", namespace=\\\"trip\\\", rule_id=\\\"0c524b2b-39b3-4eb3-992f-f0959f9286d7\\\"}\",\"truncatedAlerts\":0}\n"]
2025-03-21 16:46:34.611 INFO 1 --- [nio-8082-exec-9] c.j.a.m.s.s.controller.AlertsController : method: [prometheusCallBack], args: ["{\"receiver\":\"mcp-receiver\",\"status\":\"firing\",\"alerts\":[{\"status\":\"firing\",\"labels\":{\"alertname\":\"ImagePullFailed\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"container\":\"nginx\",\"instance\":\"172.16.175.67:8080\",\"job\":\"kube-state-metrics\",\"namespace\":\"trip\",\"pod\":\"nginx-deployment-5f7f7777dc-nfxs9\",\"reason\":\"ImagePullBackOff\",\"rule_group\":\"jiajiayue-k8s\",\"rule_id\":\"9656ad8c-8cd6-4990-8adf-7734a76d4ead\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"info\",\"uid\":\"6daea5ab-149b-4fb2-8b51-2b86c5f7178c\"},\"annotations\":{\"message\":\"Pod nginx-deployment-5f7f7777dc-nfxs9 in Namespace trip is failing to pull image, Reason: ImagePullBackOff.\",\"summary\":\"Image Pull Failed in Namespace trip\"},\"startsAt\":\"2025-03-21T08:31:04.51Z\",\"endsAt\":\"0001-01-01T00:00:00Z\",\"generatorURL\":\"\",\"fingerprint\":\"1077b323f8ccec86\"}],\"groupLabels\":{\"alertname\":\"ImagePullFailed\",\"cluster\":\"wh-member\",\"namespace\":\"trip\",\"rule_id\":\"9656ad8c-8cd6-4990-8adf-7734a76d4ead\"},\"commonLabels\":{\"alertname\":\"ImagePullFailed\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"container\":\"nginx\",\"instance\":\"172.16.175.67:8080\",\"job\":\"kube-state-metrics\",\"namespace\":\"trip\",\"pod\":\"nginx-deployment-5f7f7777dc-nfxs9\",\"reason\":\"ImagePullBackOff\",\"rule_group\":\"jiajiayue-k8s\",\"rule_id\":\"9656ad8c-8cd6-4990-8adf-7734a76d4ead\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"info\",\"uid\":\"6daea5ab-149b-4fb2-8b51-2b86c5f7178c\"},\"commonAnnotations\":{\"message\":\"Pod nginx-deployment-5f7f7777dc-nfxs9 in Namespace trip is failing to pull image, Reason: ImagePullBackOff.\",\"summary\":\"Image Pull Failed in Namespace trip\"},\"externalURL\":\"http://whizard-notification-alertmanager-0:9093\",\"version\":\"4\",\"groupKey\":\"{}/{alerttype=~\\\".*\\\"}:{alertname=\\\"ImagePullFailed\\\", cluster=\\\"wh-member\\\", namespace=\\\"trip\\\", rule_id=\\\"9656ad8c-8cd6-4990-8adf-7734a76d4ead\\\"}\",\"truncatedAlerts\":0}\n"]
2025-03-21 16:49:12.481 INFO 1 --- [nio-8082-exec-3] c.j.a.m.s.s.controller.AlertsController : method: [prometheusCallBack], args: ["{\"receiver\":\"mcp-receiver\",\"status\":\"firing\",\"alerts\":[{\"status\":\"firing\",\"labels\":{\"alertname\":\"KubeDeploymentReplicasMismatch\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"deployment\":\"nginx-deployment\",\"instance\":\"172.16.175.67:8080\",\"job\":\"kube-state-metrics\",\"namespace\":\"trip\",\"rule_group\":\"kubernetes-apps\",\"rule_id\":\"f2b17680-2a26-45ba-a88b-d0a7393ca8c0\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"info\"},\"annotations\":{\"description\":\"Deployment trip/nginx-deployment has not matched the expected number of replicas for longer than 5 minutes.\",\"runbook_url\":\"https://alert-runbooks.kubesphere.io/latest/runbooks/kubernetes/kubedeploymentreplicasmismatch\",\"summary\":\"Deployment has not matched the expected number of replicas.\"},\"startsAt\":\"2025-03-21T08:43:42.375Z\",\"endsAt\":\"0001-01-01T00:00:00Z\",\"generatorURL\":\"\",\"fingerprint\":\"afcb34f93dc5624f\"}],\"groupLabels\":{\"alertname\":\"KubeDeploymentReplicasMismatch\",\"cluster\":\"wh-member\",\"namespace\":\"trip\",\"rule_id\":\"f2b17680-2a26-45ba-a88b-d0a7393ca8c0\"},\"commonLabels\":{\"alertname\":\"KubeDeploymentReplicasMismatch\",\"alerttype\":\"metric\",\"cluster\":\"wh-member\",\"deployment\":\"nginx-deployment\",\"instance\":\"172.16.175.67:8080\",\"job\":\"kube-state-metrics\",\"namespace\":\"trip\",\"rule_group\":\"kubernetes-apps\",\"rule_id\":\"f2b17680-2a26-45ba-a88b-d0a7393ca8c0\",\"rule_level\":\"cluster\",\"rule_type\":\"custom\",\"severity\":\"info\"},\"commonAnnotations\":{\"description\":\"Deployment trip/nginx-deployment has not matched the expected number of replicas for longer than 5 minutes.\",\"runbook_url\":\"https://alert-runbooks.kubesphere.io/latest/runbooks/kubernetes/kubedeploymentreplicasmismatch\",\"summary\":\"Deployment has not matched the expected number of replicas.\"},\"externalURL\":\"http://whizard-notification-alertmanager-0:9093\",\"version\":\"4\",\"groupKey\":\"{}/{alerttype=~\\\".*\\\"}:{alertname=\\\"KubeDeploymentReplicasMismatch\\\", cluster=\\\"wh-member\\\", namespace=\\\"trip\\\", rule_id=\\\"f2b17680-2a26-45ba-a88b-d0a7393ca8c0\\\"}\",\"truncatedAlerts\":0}\n"]
这里能看到
KubePodNotReady告警startsAt:2025-03-21T08:44:42.375Z。发出告警时间: 2025-03-21 16:45:12.544;
16:28-16:44(15分钟验证),16:45发出(这里用了1分钟多),正常
ImagePullBackOff告警startsAt:2025-03-21T08:31:04.51Z。发出告警时间:2025-03-21 16:46:34.611;
16:28-16:31(3分钟验证),16:46发出(这里用了15分钟多),不正常了
KubeDeploymentReplicasMismatch告警startsAt:2025-03-21T08:43:42.375Z 发出告警时间:2025-03-21 16:49:12.481;
16:28-16:43(10分钟查询5分钟验证),16:49发出(这里用了7分钟),算上group_interval等配置
正常
请大佬帮忙看看,为啥ImagePullBackOff晚告警出来这么久