创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。
操作系统信息
虚拟机CentOS8.0,8G
Kubernetes版本信息
将 kubectl version
命令执行结果贴在下方
k8s版本V1.20.7
容器运行时
将 docker version
/ crictl version
/ nerdctl version
结果贴在下方
docker version23.0.1
crictl version 0.1.0
KubeSphere版本信息
v3.3.1,在线安装。在已有K8s上安装。
问题是什么
页面显示

使用命令查看pod:kubectl describe pod alertmanager-main-0 -n kubesphere-monitoring-system
输入如下
Name: alertmanager-main-2
Namespace: kubesphere-monitoring-system
Priority: 0
Node: node18/172.32.173.18
Start Time: Fri, 24 Mar 2023 10:26:42 -0400
Labels: alertmanager=main
app.kubernetes.io/component=alert-router
app.kubernetes.io/instance=main
app.kubernetes.io/managed-by=prometheus-operator
app.kubernetes.io/name=alertmanager
app.kubernetes.io/part-of=kube-prometheus
app.kubernetes.io/version=0.23.0
controller-revision-hash=alertmanager-main-f67787f9b
statefulset.kubernetes.io/pod-name=alertmanager-main-2
Annotations: cni.projectcalico.org/containerID: 9ec179df6447cb6687c5f35d6d67a2361a1a2bc8ef1fa7d389c12d3e3e5dce41
cni.projectcalico.org/podIP: 10.244.193.76/32
cni.projectcalico.org/podIPs: 10.244.193.76/32
kubectl.kubernetes.io/default-container: alertmanager
Status: Running
IP: 10.244.193.76
IPs:
IP: 10.244.193.76
Controlled By: StatefulSet/alertmanager-main
Containers:
alertmanager:
Container ID: docker://bdf6304bb02442231c2e15d38519f32f89ffbc4091b2e7f7071f377aa16510b1
Image: prom/alertmanager:v0.23.0
Image ID: docker-pullable://prom/alertmanager@sha256:9ab73a421b65b80be072f96a88df756fc5b52a1bc8d983537b8ec5be8b624c5a
Ports: 9093/TCP, 9094/TCP, 9094/UDP
Host Ports: 0/TCP, 0/TCP, 0/UDP
Args:
--config.file=/etc/alertmanager/config/alertmanager.yaml
--storage.path=/alertmanager
--data.retention=120h
--cluster.listen-address=[$(POD_IP)]:9094
--web.listen-address=:9093
--web.route-prefix=/
--cluster.peer=alertmanager-main-0.alertmanager-operated:9094
--cluster.peer=alertmanager-main-1.alertmanager-operated:9094
--cluster.peer=alertmanager-main-2.alertmanager-operated:9094
--cluster.reconnect-timeout=5m
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Message: level=info ts=2023-03-28T08:05:59.391Z caller=main.go:225 msg="Starting Alertmanager" version="(version=0.23.0, branch=HEAD, revision=61046b17771a57cfd4c4a51be370ab930a4d7d54)"
level=info ts=2023-03-28T08:05:59.391Z caller=main.go:226 build_context="(go=go1.16.7, user=root@e21a959be8d2, date=20210825-10:48:55)"
Exit Code: 2
Started: Tue, 28 Mar 2023 04:05:59 -0400
Finished: Tue, 28 Mar 2023 04:07:39 -0400
Ready: False
Restart Count: 1289
Limits:
cpu: 200m
memory: 200Mi
Requests:
cpu: 20m
memory: 30Mi
Liveness: http-get http://:web/-/healthy delay=0s timeout=3s period=10s #success=1 #failure=10
Readiness: http-get http://:web/-/ready delay=3s timeout=3s period=5s #success=1 #failure=10
Environment:
POD_IP: (v1:status.podIP)
Mounts:
/alertmanager from alertmanager-main-db (rw)
/etc/alertmanager/certs from tls-assets (ro)
/etc/alertmanager/config from config-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from alertmanager-main-token-l9p7m (ro)
config-reloader:
Container ID: docker://26417689463cb72a84ccd088b3020b11442c73d9276f6deaf2fa78aa76fb053f
Image: kubesphere/prometheus-config-reloader:v0.55.1
Image ID: docker-pullable://kubesphere/prometheus-config-reloader@sha256:77c5a31bd7ac72a4b3ba3a6d3aa8e593eb070bd61384c49e96ad3d4aa0aa185d
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/prometheus-config-reloader
Args:
--listen-address=:8080
--reload-url=http://localhost:9093/-/reload
--watched-dir=/etc/alertmanager/config
State: Running
Started: Fri, 24 Mar 2023 10:26:43 -0400
Ready: True
Restart Count: 0
Limits:
cpu: 100m
memory: 50Mi
Requests:
cpu: 100m
memory: 50Mi
Environment:
POD_NAME: alertmanager-main-2 (v1:metadata.name)
SHARD: -1
Mounts:
/etc/alertmanager/config from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from alertmanager-main-token-l9p7m (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
config-volume:
Type: Secret (a volume populated by a Secret)
SecretName: alertmanager-main-generated
Optional: false
tls-assets:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: alertmanager-main-tls-assets-0
SecretOptionalName: <nil>
alertmanager-main-db:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
alertmanager-main-token-l9p7m:
Type: Secret (a volume populated by a Secret)
SecretName: alertmanager-main-token-l9p7m
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 16m (x15665 over 3d17h) kubelet Back-off restarting failed container
Warning Unhealthy 11m (x25108 over 3d17h) kubelet Readiness probe failed: Get "http://10.244.193.76:9093/-/ready": dial tcp 10.244.193.76:9093: connect: connection refused
查看日志kubectl logs alertmanager-main-2 -n kubesphere-monitoring-system -c alertmanager
输出如下:
level=info ts=2023-03-29T06:52:29.438Z caller=main.go:225 msg=“Starting Alertmanager” version=“(version=0.23.0, branch=HEAD, revision=61046b17771a57cfd4c4a51be370ab930a4d7d54)”
level=info ts=2023-03-29T06:52:29.438Z caller=main.go:226 build_context=“(go=go1.16.7, user=root@e21a959be8d2, date=20210825-10:48:55)”
在kubesphere页面上删除pod重新生成后,还是同样的问题,看着是9093端口启动失败,求教大家怎么排查