• 安装部署
  • kubesphere部署成功,alertmanager显示异常

创建部署问题时,请参考下面模板,你提供的信息越多,越容易及时获得解答。如果未按模板创建问题,管理员有权关闭问题。
确保帖子格式清晰易读,用 markdown code block 语法格式化代码块。
你只花一分钟创建的问题,不能指望别人花上半个小时给你解答。

操作系统信息
虚拟机CentOS8.0,8G

Kubernetes版本信息
kubectl version 命令执行结果贴在下方

k8s版本V1.20.7

容器运行时
docker version / crictl version / nerdctl version 结果贴在下方

docker version23.0.1

crictl version 0.1.0

KubeSphere版本信息
v3.3.1,在线安装。在已有K8s上安装。

问题是什么
页面显示

使用命令查看pod:kubectl describe pod alertmanager-main-0 -n kubesphere-monitoring-system

输入如下

Name:         alertmanager-main-2
Namespace:    kubesphere-monitoring-system
Priority:     0
Node:         node18/172.32.173.18
Start Time:   Fri, 24 Mar 2023 10:26:42 -0400
Labels:       alertmanager=main
              app.kubernetes.io/component=alert-router
              app.kubernetes.io/instance=main
              app.kubernetes.io/managed-by=prometheus-operator
              app.kubernetes.io/name=alertmanager
              app.kubernetes.io/part-of=kube-prometheus
              app.kubernetes.io/version=0.23.0
              controller-revision-hash=alertmanager-main-f67787f9b
              statefulset.kubernetes.io/pod-name=alertmanager-main-2
Annotations:  cni.projectcalico.org/containerID: 9ec179df6447cb6687c5f35d6d67a2361a1a2bc8ef1fa7d389c12d3e3e5dce41
              cni.projectcalico.org/podIP: 10.244.193.76/32
              cni.projectcalico.org/podIPs: 10.244.193.76/32
              kubectl.kubernetes.io/default-container: alertmanager
Status:       Running
IP:           10.244.193.76
IPs:
  IP:           10.244.193.76
Controlled By:  StatefulSet/alertmanager-main
Containers:
  alertmanager:
    Container ID:  docker://bdf6304bb02442231c2e15d38519f32f89ffbc4091b2e7f7071f377aa16510b1
    Image:         prom/alertmanager:v0.23.0
    Image ID:      docker-pullable://prom/alertmanager@sha256:9ab73a421b65b80be072f96a88df756fc5b52a1bc8d983537b8ec5be8b624c5a
    Ports:         9093/TCP, 9094/TCP, 9094/UDP
    Host Ports:    0/TCP, 0/TCP, 0/UDP
    Args:
      --config.file=/etc/alertmanager/config/alertmanager.yaml
      --storage.path=/alertmanager
      --data.retention=120h
      --cluster.listen-address=[$(POD_IP)]:9094
      --web.listen-address=:9093
      --web.route-prefix=/
      --cluster.peer=alertmanager-main-0.alertmanager-operated:9094
      --cluster.peer=alertmanager-main-1.alertmanager-operated:9094
      --cluster.peer=alertmanager-main-2.alertmanager-operated:9094
      --cluster.reconnect-timeout=5m
    State:       Waiting
      Reason:    CrashLoopBackOff
    Last State:  Terminated
      Reason:    Error
      Message:   level=info ts=2023-03-28T08:05:59.391Z caller=main.go:225 msg="Starting Alertmanager" version="(version=0.23.0, branch=HEAD, revision=61046b17771a57cfd4c4a51be370ab930a4d7d54)"
level=info ts=2023-03-28T08:05:59.391Z caller=main.go:226 build_context="(go=go1.16.7, user=root@e21a959be8d2, date=20210825-10:48:55)"

      Exit Code:    2
      Started:      Tue, 28 Mar 2023 04:05:59 -0400
      Finished:     Tue, 28 Mar 2023 04:07:39 -0400
    Ready:          False
    Restart Count:  1289
    Limits:
      cpu:     200m
      memory:  200Mi
    Requests:
      cpu:      20m
      memory:   30Mi
    Liveness:   http-get http://:web/-/healthy delay=0s timeout=3s period=10s #success=1 #failure=10
    Readiness:  http-get http://:web/-/ready delay=3s timeout=3s period=5s #success=1 #failure=10
    Environment:
      POD_IP:   (v1:status.podIP)
    Mounts:
      /alertmanager from alertmanager-main-db (rw)
      /etc/alertmanager/certs from tls-assets (ro)
      /etc/alertmanager/config from config-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from alertmanager-main-token-l9p7m (ro)
  config-reloader:
    Container ID:  docker://26417689463cb72a84ccd088b3020b11442c73d9276f6deaf2fa78aa76fb053f
    Image:         kubesphere/prometheus-config-reloader:v0.55.1
    Image ID:      docker-pullable://kubesphere/prometheus-config-reloader@sha256:77c5a31bd7ac72a4b3ba3a6d3aa8e593eb070bd61384c49e96ad3d4aa0aa185d
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /bin/prometheus-config-reloader
    Args:
      --listen-address=:8080
      --reload-url=http://localhost:9093/-/reload
      --watched-dir=/etc/alertmanager/config
    State:          Running
      Started:      Fri, 24 Mar 2023 10:26:43 -0400
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     100m
      memory:  50Mi
    Requests:
      cpu:     100m
      memory:  50Mi
    Environment:
      POD_NAME:  alertmanager-main-2 (v1:metadata.name)
      SHARD:     -1
    Mounts:
      /etc/alertmanager/config from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from alertmanager-main-token-l9p7m (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  config-volume:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  alertmanager-main-generated
    Optional:    false
  tls-assets:
    Type:                Projected (a volume that contains injected data from multiple sources)
    SecretName:          alertmanager-main-tls-assets-0
    SecretOptionalName:  <nil>
  alertmanager-main-db:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  <unset>
  alertmanager-main-token-l9p7m:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  alertmanager-main-token-l9p7m
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  kubernetes.io/os=linux
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                      From     Message
  ----     ------     ----                     ----     -------
  Warning  BackOff    16m (x15665 over 3d17h)  kubelet  Back-off restarting failed container
  Warning  Unhealthy  11m (x25108 over 3d17h)  kubelet  Readiness probe failed: Get "http://10.244.193.76:9093/-/ready": dial tcp 10.244.193.76:9093: connect: connection refused 

查看日志kubectl logs alertmanager-main-2 -n kubesphere-monitoring-system -c alertmanager

输出如下:

level=info ts=2023-03-29T06:52:29.438Z caller=main.go:225 msg=“Starting Alertmanager” version=“(version=0.23.0, branch=HEAD, revision=61046b17771a57cfd4c4a51be370ab930a4d7d54)”

level=info ts=2023-03-29T06:52:29.438Z caller=main.go:226 build_context=“(go=go1.16.7, user=root@e21a959be8d2, date=20210825-10:48:55)”

在kubesphere页面上删除pod重新生成后,还是同样的问题,看着是9093端口启动失败,求教大家怎么排查

1 年 后
1 年 后

大佬,后来解决了吗,我现在也遇到了同样问题