问题描述:

丢失3月13日 8:00-3月14日8:00,3月16日8:00-3月17日8:00 两天的日志数据。这两天所有的日志都没有记录。直接进应用容器下载容器日志是可以看到日志数据的。

软件版本:

  • kubernetes v1.18.6
  • kubesphere v3.0.0
  • es的index状态不正常,导致日志无法写入。你看下是不是es的存储空间满了

看看fluent-bit的日志有没有报错,describe一下fluent-bit的pod,看看有没有异常
exec进es的pod,执行
curl -XGET ‘elasticsearch-logging-data.kubesphere-logging-system.svc:9200/_cat/indices?v&pretty’
看看这两天的日志文件有没有生成

    wanjunlei
    感谢您的回复。我今天查了一下,如下;

    curl -XGET ‘elasticsearch-logging-data.kubesphere-logging-system.svc:9200/_cat/indices?v&pretty’

    1.elasticsearch 监控状态很多是red,和这个有关吗?

    health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
    red open ks-logstash-log-2021.03.16 Y0FjUrFiRiWy9cAfWUHpgg 5 1
    red open ks-logstash-log-2021.03.14 2XSxytTSSuKPdBr0FL6W4Q 5 1 5922651 0 3.3gb 1.6gb
    green open ks-logstash-events-2021.03.14 yF7anY2tTjCfzkRBk8Iphw 5 1 29580 0 28.7mb 14.1mb
    green open ks-logstash-log-2021.03.12 bVno4bDmTLuvAD1bPBOmaQ 5 1 18434271 0 6.7gb 3.3gb
    green open ks-logstash-events-2021.03.11 WzbEsVJfRDiI3pojRPwOUg 5 1 30816 0 31.7mb 15.8mb
    green open ks-logstash-auditing-2021.03.12 KLC2HckyQ0WZYyZjZ6GsOA 5 1 692 0 1.4mb 718kb
    green open ks-logstash-auditing-2021.03.17 zawsVkaMQW6ViKV7Wqf7Hg 5 1 549 0 1.1mb 612.6kb
    red open ks-logstash-log-2021.03.17 kgBpEq8OReOmi9YV3fvu6A 5 1
    green open ks-logstash-auditing-2021.03.16 Uc3xeiJfRqiXxucTejRlfQ 5 1 549 0 1.1mb 602.7kb
    green open ks-logstash-auditing-2021.03.14 XtEXtB5_Rr-uJVb_Inr8CQ 5 1 487 0 1.4mb 811.4kb
    green open ks-logstash-auditing-2021.03.13 yoXhgBL1RZeER5iJgcgDaw 5 1 520 0 1.4mb 674.8kb
    green open ks-logstash-events-2021.03.12 J8PwiPiBRHeXTHvfoBkp7Q 5 1 31213 0 32.6mb 16.3mb
    red open ks-logstash-log-2021.03.15 GqcUifwkTnu8Lr5aawCf4w 5 1 3337233 0 2.4gb 1.2gb
    green open ks-logstash-auditing-2021.03.11 2XT_F1u9TDOR1cu7g0fVkA 5 1 631 0 1.4mb 758.6kb
    green open ks-logstash-events-2021.03.15 szTux1M6TUWRU12Yi3tkaA 5 1 30018 0 30.1mb 15mb
    green open ks-logstash-events-2021.03.17 mGEiBV5hRIy7lPB2qqFgqA 5 1 31547 0 33mb 16.5mb
    red open ks-logstash-log-2021.03.13 H_ALKzDWQbC5dQZ1iIo31g 5 1
    green open ks-logstash-events-2021.03.18 CLzOOeKJQYWzWlOmIx5iHQ 5 1 1985 0 6.9mb 3.4mb
    green open ks-logstash-log-2021.03.18 sEjFbVuCSuGK7lON-d5SbA 5 1 421109 0 242.5mb 121.2mb
    green open ks-logstash-events-2021.03.13 EVXRZr_tRY-1R92VcskvHw 5 1 29626 0 29.1mb 14.5mb
    green open ks-logstash-auditing-2021.03.15 80McbVbpQ5CDxwndVKACEQ 5 1 530 0 1mb 554.2kb
    green open ks-logstash-auditing-2021.03.18 -dAHB1w_ROOIW5OM18XOPw 5 1 42 0 1mb 590.5kb
    red open ks-logstash-log-2021.03.11 dBBTnEuuQs-rFmFKGAXlWQ 5 1 17988973 0 6.5gb 3.2gb
    green open ks-logstash-events-2021.03.16 v3W_GV_nS--SPAQcV9Emhw 5 1 30963 0 32mb 16mb

    文字格式有点乱,补充截图

    2.fluentbit-operator 看日志似乎OOM了,是因为这个吗,如果是怎么调整

    kubectl describe pod -n kubesphere-logging-system fluentbit-operator-855d4b977d-xwjxs

    Name: fluentbit-operator-855d4b977d-xwjxs
    Namespace: kubesphere-logging-system
    Priority: 0
    Node: k8s-node2/192.168.0.175
    Start Time: Fri, 25 Sep 2020 02:30:14 +0800
    Labels: app.kubernetes.io/component=operator
    app.kubernetes.io/name=fluentbit-operator
    pod-template-hash=855d4b977d
    Annotations: cni.projectcalico.org/podIP: 10.233.76.223/32
    cni.projectcalico.org/podIPs: 10.233.76.223/32
    Status: Running
    IP: 10.233.76.223
    IPs:
    IP: 10.233.76.223
    Controlled By: ReplicaSet/fluentbit-operator-855d4b977d
    Init Containers:
    setenv:
    Container ID: docker://a121b6a47de7a842a45be2a45792ecebcb5b3bf487a809116e8edfec9c487417
    Image: docker:19.03
    Image ID: docker-pullable://docker@sha256:57ddfc5b9f4f89f1598440cd1d6d97b87532b0bce1315e7880ae6843e3583529
    Port: <none>
    Host Port: <none>
    Command:
    /bin/sh
    -c
    set -ex; echo DOCKER_ROOT_DIR=$(docker info -f {{.DockerRootDir}}) > /fluentbit-operator/fluent-bit.env
    State: Terminated
    Reason: Completed
    Exit Code: 0
    Started: Mon, 04 Jan 2021 07:41:53 +0800
    Finished: Mon, 04 Jan 2021 07:42:00 +0800
    Ready: True
    Restart Count: 4
    Environment: <none>
    Mounts:
    /fluentbit-operator from env (rw)
    /var/run/docker.sock from dockersock (ro)
    /var/run/secrets/kubernetes.io/serviceaccount from fluentbit-operator-token-kslrx (ro)
    Containers:
    fluentbit-operator:
    Container ID: docker://8bcad2f7d504bb24aca63b0613f960756af768e67d24ea56f27361a8532f674a
    Image: kubesphere/fluentbit-operator:v0.2.0
    Image ID: docker-pullable://kubesphere/fluentbit-operator@sha256:914864f8d56931274554432d6e6674799c05284aa8c88ff72ae1a9a04a4dc873
    Port: <none>
    Host Port: <none>
    State: Running
    Started: Wed, 17 Mar 2021 11:23:43 +0800
    Last State: Terminated
    Reason: OOMKilled
    Exit Code: 137
    Started: Fri, 12 Mar 2021 16:17:26 +0800
    Finished: Wed, 17 Mar 2021 11:23:42 +0800
    Ready: True
    Restart Count: 74
    Limits:
    cpu: 100m
    memory: 30Mi
    Requests:
    cpu: 100m
    memory: 20Mi
    Environment: <none>
    Mounts:
    /fluentbit-operator from env (rw)
    /var/run/secrets/kubernetes.io/serviceaccount from fluentbit-operator-token-kslrx (ro)
    Conditions:
    Type Status
    Initialized True
    Ready True
    ContainersReady True
    PodScheduled True
    Volumes:
    env:
    Type: EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit: <unset>
    dockersock:
    Type: HostPath (bare host directory volume)
    Path: /var/run/docker.sock
    HostPathType:
    fluentbit-operator-token-kslrx:
    Type: Secret (a volume populated by a Secret)
    SecretName: fluentbit-operator-token-kslrx
    Optional: false
    QoS Class: Burstable
    Node-Selectors: <none>
    Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
    node.kubernetes.io/unreachable:NoExecute for 300s
    Events: <none>

    kubectl describe pod -n kubesphere-logging-system fluent-bit-9dbnr
    Name: fluent-bit-9dbnr
    Namespace: kubesphere-logging-system
    Priority: 0
    Node: k8s-master1/192.168.0.171
    Start Time: Tue, 08 Sep 2020 10:01:13 +0800
    Labels: app.kubernetes.io/name=fluent-bit
    controller-revision-hash=d8f95598f
    pod-template-generation=2
    Annotations: cni.projectcalico.org/podIP: 10.233.68.104/32
    cni.projectcalico.org/podIPs: 10.233.68.104/32
    Status: Running
    IP: 10.233.68.104
    IPs:
    IP: 10.233.68.104
    Controlled By: DaemonSet/fluent-bit
    Containers:
    fluent-bit:
    Container ID: docker://1356ca4635d396ee9251a5992991e0b8337e9d3969607e9e899678f3eba5d5e9
    Image: kubesphere/fluent-bit:v1.4.6
    Image ID: docker-pullable://kubesphere/fluent-bit@sha256:1007b7cb7090435bf5b5d04f07cf6982d841597218fb67e291b2606c4e25b3e2
    Port: 2020/TCP
    Host Port: 0/TCP
    State: Running
    Started: Thu, 18 Mar 2021 03:00:01 +0800
    Last State: Terminated
    Reason: Error
    Exit Code: 2
    Started: Wed, 17 Mar 2021 03:00:02 +0800
    Finished: Thu, 18 Mar 2021 03:00:01 +0800
    Ready: True
    Restart Count: 72
    Environment: <none>
    Mounts:
    /fluent-bit/config from config (ro)
    /fluent-bit/tail from positions (rw)
    /var/lib/docker/containers from varlibcontainers (ro)
    /var/log/ from varlogs (ro)
    /var/run/secrets/kubernetes.io/serviceaccount from fluent-bit-token-wc9x4 (ro)
    Conditions:
    Type Status
    Initialized True
    Ready True
    ContainersReady True
    PodScheduled True
    Volumes:
    varlibcontainers:
    Type: HostPath (bare host directory volume)
    Path: /var/lib/docker/containers
    HostPathType:
    config:
    Type: Secret (a volume populated by a Secret)
    SecretName: fluent-bit-config
    Optional: false
    varlogs:
    Type: HostPath (bare host directory volume)
    Path: /var/log
    HostPathType:
    positions:
    Type: EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit: <unset>
    fluent-bit-token-wc9x4:
    Type: Secret (a volume populated by a Secret)
    SecretName: fluent-bit-token-wc9x4
    Optional: false
    QoS Class: BestEffort
    Node-Selectors: <none>
    Tolerations:
    node.kubernetes.io/disk-pressure:NoSchedule
    node.kubernetes.io/memory-pressure:NoSchedule
    node.kubernetes.io/not-ready:NoExecute
    node.kubernetes.io/pid-pressure:NoSchedule
    node.kubernetes.io/unreachable:NoExecute
    node.kubernetes.io/unschedulable:NoSchedule
    Events: <none>

    es的index状态不正常,导致日志无法写入。你看下是不是es的存储空间满了

      wanjunlei

      感谢回复。只剩2.28g,应该是这个问题,我扩容后再观察一下,非常感谢!!!