操作系统信息
腾讯云TKE(节点池,3个节点普通节点(宿主centos7.6),单节点16C*32G);

Kubernetes版本信息
kubectl version 命令执行结果贴在下方

Client Version: version.Info{Major:“1”, Minor:“22”, GitVersion:“v1.22.1”, GitCommit:“632ed300f2c34f6d6d15ca4cef3d3c7073412212”, GitTreeState:“clean”, BuildDate:“2021-08-19T15:45:37Z”, GoVersion:“go1.16.7”, Compiler:“gc”, Platform:“linux/amd64”}

Server Version: version.Info{Major:“1”, Minor:“24+”, GitVersion:“v1.24.4-tke.15”, GitCommit:“d87b3a619cb4eb4072f1c7706c5d5d0395415871”, GitTreeState:“clean”, BuildDate:“2024-06-14T03:22:26Z”, GoVersion:“go1.18.8”, Compiler:“gc”, Platform:“linux/amd64”}

WARNING: version difference between client (1.22) and server (1.24) exceeds the supported minor version skew of +/-1

KubeSphere版本信息(含其他)

  1. ks版本:V3.3.2;

  2. kubernetes版本:1.24.4;

  3. 基础环境:腾讯云TKE(节点池,3个节点普通节点(宿主centos7.6));

  4. 安装方式:https://kubesphere.io/zh/docs/v3.4/installing-on-kubernetes/hosted-kubernetes/install-ks-on-tencent-tke/#%E9%80%9A%E8%BF%87-ks-installer-%E6%89%A7%E8%A1%8C%E6%9C%80%E5%B0%8F%E5%8C%96%E9%83%A8%E7%BD%B2

问题是什么
工作负载中所有的pod都没有监控数据

  • 概览中资源用量、kubernetes状态、节点等都有监控数据;

  • 概览–>节点–>查看更多,集群状态都有监控数据;

  • 节点–>集群节点-,都有监控数据;

  • 应用负载–>工作负载,随机查看一个负载,里面的pod都没有监控数据;查看负载的监控或pod的监控都没有任何数据,但如果在pod的看板节目停留,会有当前实时数据(观测后端接口发现实时刷新的数据有,但是历史数据没有);

截图:

刚进工作负载界面:

进入工作负载一段时间后:

查看监控数据:

日志:

查看监控数据时的日志(服务名称用xxx-service代替)

请求:

响应:

{

"results": [

    {

        "metric_name": "pod_memory_usage_wo_cache",

        "data": {},

        "error": "execution: found duplicate series for the match group {namespace=\\"default\\", pod=\\"xxx-service-58d568b877-8lk6z\\"} on the right hand-side of the operation: [{__name__=\\"kube_pod_owner\\", container=\\"kube-state-metrics\\", endpoint=\\"http-metrics\\", instance=\\"192.168.2.159:8180\\", job=\\"kube-state-metrics\\", namespace=\\"default\\", owner_is_controller=\\"true\\", owner_kind=\\"ReplicaSet\\", owner_name=\\"xxx-service-58d568b877\\", pod=\\"xxx-service-58d568b877-8lk6z\\", service=\\"tke-kube-state-metrics\\", uid=\\"b724e22d-449c-4dc5-baa6-4424e52bbacd\\"}, {__name__=\\"kube_pod_owner\\", container=\\"kube-rbac-proxy-main\\", instance=\\"192.168.2.92:8443\\", job=\\"kube-state-metrics\\", namespace=\\"default\\", owner_is_controller=\\"true\\", owner_kind=\\"ReplicaSet\\", owner_name=\\"xxx-service-58d568b877\\", pod=\\"xxx-service-58d568b877-8lk6z\\", uid=\\"b724e22d-449c-4dc5-baa6-4424e52bbacd\\"}];many-to-many matching not allowed: matching labels must be unique on one side"

    },

    {

        "metric_name": "pod_cpu_usage",

        "data": {},

        "error": "execution: found duplicate series for the match group {namespace=\\"default\\", pod=\\"xxx-service-58d568b877-8lk6z\\"} on the right hand-side of the operation: [{__name__=\\"kube_pod_owner\\", container=\\"kube-state-metrics\\", endpoint=\\"http-metrics\\", instance=\\"192.168.2.159:8180\\", job=\\"kube-state-metrics\\", namespace=\\"default\\", owner_is_controller=\\"true\\", owner_kind=\\"ReplicaSet\\", owner_name=\\"xxx-service-58d568b877\\", pod=\\"xxx-service-58d568b877-8lk6z\\", service=\\"tke-kube-state-metrics\\", uid=\\"b724e22d-449c-4dc5-baa6-4424e52bbacd\\"}, {__name__=\\"kube_pod_owner\\", container=\\"kube-rbac-proxy-main\\", instance=\\"192.168.2.92:8443\\", job=\\"kube-state-metrics\\", namespace=\\"default\\", owner_is_controller=\\"true\\", owner_kind=\\"ReplicaSet\\", owner_name=\\"xxx-service-58d568b877\\", pod=\\"xxx-service-58d568b877-8lk6z\\", uid=\\"b724e22d-449c-4dc5-baa6-4424e52bbacd\\"}];many-to-many matching not allowed: matching labels must be unique on one side"

    },

    {

        "metric_name": "pod_net_bytes_transmitted",

        "data": {},

        "error": "execution: found duplicate series for the match group {namespace=\\"default\\", pod=\\"xxx-service-58d568b877-8lk6z\\"} on the right hand-side of the operation: [{__name__=\\"kube_pod_owner\\", container=\\"kube-state-metrics\\", endpoint=\\"http-metrics\\", instance=\\"192.168.2.159:8180\\", job=\\"kube-state-metrics\\", namespace=\\"default\\", owner_is_controller=\\"true\\", owner_kind=\\"ReplicaSet\\", owner_name=\\"xxx-service-58d568b877\\", pod=\\"xxx-service-58d568b877-8lk6z\\", service=\\"tke-kube-state-metrics\\", uid=\\"b724e22d-449c-4dc5-baa6-4424e52bbacd\\"}, {__name__=\\"kube_pod_owner\\", container=\\"kube-rbac-proxy-main\\", instance=\\"192.168.2.92:8443\\", job=\\"kube-state-metrics\\", namespace=\\"default\\", owner_is_controller=\\"true\\", owner_kind=\\"ReplicaSet\\", owner_name=\\"xxx-service-58d568b877\\", pod=\\"xxx-service-58d568b877-8lk6z\\", uid=\\"b724e22d-449c-4dc5-baa6-4424e52bbacd\\"}];many-to-many matching not allowed: matching labels must be unique on one side"

    },

    {

        "metric_name": "pod_net_bytes_received",

        "data": {},

        "error": "execution: found duplicate series for the match group {namespace=\\"default\\", pod=\\"xxx-service-58d568b877-8lk6z\\"} on the right hand-side of the operation: [{__name__=\\"kube_pod_owner\\", container=\\"kube-state-metrics\\", endpoint=\\"http-metrics\\", instance=\\"192.168.2.159:8180\\", job=\\"kube-state-metrics\\", namespace=\\"default\\", owner_is_controller=\\"true\\", owner_kind=\\"ReplicaSet\\", owner_name=\\"xxx-service-58d568b877\\", pod=\\"xxx-service-58d568b877-8lk6z\\", service=\\"tke-kube-state-metrics\\", uid=\\"b724e22d-449c-4dc5-baa6-4424e52bbacd\\"}, {__name__=\\"kube_pod_owner\\", container=\\"kube-rbac-proxy-main\\", instance=\\"192.168.2.92:8443\\", job=\\"kube-state-metrics\\", namespace=\\"default\\", owner_is_controller=\\"true\\", owner_kind=\\"ReplicaSet\\", owner_name=\\"xxx-service-58d568b877\\", pod=\\"xxx-service-58d568b877-8lk6z\\", uid=\\"b724e22d-449c-4dc5-baa6-4424e52bbacd\\"}];many-to-many matching not allowed: matching labels must be unique on one side"

    }

]

}

1 个月 后
16 天 后

dennis159753 @codingnow

问题原因是 kube-state-metrics 指标重复,

found duplicate series for the match group;
many-to-many matching not allowed: matching labels must be unique on one side

具体应该是环境上采集了两份 kube-state-metrics 的指标数据,一份是来自于kubesphere-monitoring-system 的kube-state-metrics, 一份看日志应该是 {service=“tke-kube-state-metrics”}, 所以计算时指标重复冲突,可以根据需求保留一份采集就行。

    15 天 后

    frezes 我的是单位错误,显示的是Ki,数值对应的G,这个单位在哪里设置知道吗

      frezes

      这里单位给的Y轴单位是Ki,监控显示占用27.61Ki,实际使用kubectl top查看到的使用的是27.61G的内存

        cqwang9

        你把这个请求响应返回的 json 贴一下,单位不支持设置,这里可能是前端数据单位转换的bug?

          frezes

          {
           "results": [
            {
             "metric_name": "pod_net_bytes_received",
             "data": {
              "resultType": "vector"
             }
            },
            {
             "metric_name": "pod_memory_usage_wo_cache",
             "data": {
              "resultType": "vector",
              "result": [
               {
                "metric": {
                 "namespace": "phm-prod",
                 "pod": "phm-collect-app-6f44bfbcbc-jnfv7"
                },
                "value": [
                 1728699460,
                 "28292.3984375"
                ],
                "min_value": "",
                "max_value": "",
                "avg_value": "",
                "sum_value": "",
                "fee": "",
                "resource_unit": "",
                "currency_unit": ""
               }
              ]
             }
            },
            {
             "metric_name": "pod_cpu_usage",
             "data": {
              "resultType": "vector",
              "result": [
               {
                "metric": {
                 "namespace": "phm-prod",
                 "pod": "phm-collect-app-6f44bfbcbc-jnfv7"
                },
                "value": [
                 1728699460,
                 "1.005"
                ],
                "min_value": "",
                "max_value": "",
                "avg_value": "",
                "sum_value": "",
                "fee": "",
                "resource_unit": "",
                "currency_unit": ""
               }
              ]
             }
            },
            {
             "metric_name": "pod_net_bytes_transmitted",
             "data": {
              "resultType": "vector"
             }
            }
           ]
          }

          cqwang9

          将prometheus 的9090 端口设置为nodeport,访问页面,查询下这个表达式

          sum by (namespace,pod)(container_memory_working_set_bytes{image!="",job="kubelet",metrics_path="/metrics/cadvisor",pod="prom-agent-k8s-0"})

          把这里的 pod="prom-agent-k8s-0" 换成你的pod name, 查询下原始表达式。 后端返回默认不带单位,返回28292,所以会处理为27Ki,但看起来和 top 结果不符合。所以需要查询下原始值。