prometheus-k8s-system-0 Error

53537

prometheus：
level=error ts=2019-11-05T09:33:25.626081908Z caller=main.go:625 err=“opening storage failed: block dir: \”/prometheus/01DRV80DEB1QP47E09CR8NRSQC\“: unexpected end of JSON input”

prometheus-config-reloader：
level=error ts=2019-11-05T09:33:11.690012168Z caller=runutil.go:43 msg=“function failed. Retrying” err=“trigger reload: reload request failed: Post http://localhost:9090/-/reload: dial tcp 127.0.0.1:9090: connect: connection refused”

huanggze

53537 看起来是 prometheus 发生过一次重启，初始化重新读取 on-disk 监控数据时发现，已有的数据 blocks 中，命为 01DRV80DEB1QP47E09CR8NRSQC 的 block 有脏数据。导致 storage 模块不能初始化而退出。prometheus-config-reloader 的报错只是正确反映了 prometheus 不可用

请问有没有对 prometheus 做什么操作（无论是直接调 api ，还是操作 prometheus-operator）？

进入 prometheus 的终端看看，把 /prometheus/01DRV80DEB1QP47E09CR8NRSQC 下的文件目录结构和 meta.json 发出来看看

53537

此环境网络较慢部署期间有超时有重启，所以有可能出现初始化中异常退出，prometheus未启动所以没看到目录内有什么

53537

请问有办法修复吗

kubesphere-monitoring-system prometheus-k8s-0 ⅔ CrashLoopBackOff 194 19h
kubesphere-monitoring-system prometheus-k8s-1 3/3 Running 0 20h
kubesphere-monitoring-system prometheus-k8s-system-0 ⅔ CrashLoopBackOff 233 19h
kubesphere-monitoring-system prometheus-k8s-system-1 3/3 Running 0 19h

huanggze

53537 把 prometheus-k8s-0 和 prometheus-k8s-system-0 连 pod 和 PVC、PV 一起删了吧，数据已经 corrupt 了。
另一个副本 xxx-1 还在运行，所以服务不会挂掉。按上面的方法删掉即可： scale down StatefulSet 到 0 再 up 到 2

53537

已经解决，谢谢

xyf123456

53537 这个怎么解决的呢，就是删除了prometheus-k8s相关的pod/pv/pvc?

cloudli

53537

这个是如何解决的呢？

leishengxn

xyf123456 你解决了吗？
我删除pvc提示没有
[root@node1 ~]# kubectl delete pvc -n kubesphere-monitoring-system prometheus-k8s-0
Error from server (NotFound): persistentvolumeclaims “prometheus-k8s-0” not found