fluent-bit log format stops loggly from parsing json log body

1.7k Views Asked by At

I'm using fluent-bit 2.1.4 in an AWS EKS cluster to ship container logs to loggly. When given properly formatted json in the 'log' field, loggly will parse it out so the fields can be easily used to filter, search, generate metrics, and some other nice things.

The trouble is, when shipping logs w/ fluent-bit, it prepends some values by default that are just raw text - they're not json key/value pairs. So now the 'log' field looks like this:

"log":"2023-05-31T12:11:40.459220575Z stdout F {<properly-formatted json>}

I have confirmed that when I manually tail the container logs myself, those values before the properly-formatted json aren't there. In reading about inputs, outputs, parsers, and filters in fluent-bit, everything I might use to remove these values seems to assume you're working on the json-formatted part of the log line (e.g. it expects you can address the field(s) you want to alter/remove are addressable with a key).

How can I get rid of the parts of the log line here that are not json?

Here's my configuration, taken from a running fluent-bit deployment using kubectl describe configmap:

====
custom_parsers.conf:
----
[PARSER]
    Name        docker-local
    Format      json
    Time_Key    asctime
    Time_Format %FT%T.%L%Z
    Time_Keep   Off

fluent-bit.conf:
----
[SERVICE]
    Flush         1
    Log_Level     info
    Daemon        off
    Parsers_File  custom_parsers.conf
    HTTP_Server   Off

[INPUT]
    Name              tail
    Tag               kube.*
    Path              /var/log/containers/*.log
    Parser            docker-local
    multiline.parser  docker
    DB                /var/log/flb_kube.db
    Mem_Buf_Limit     512MB
    Skip_Long_Lines   On
    Refresh_Interval  10
    Ignore_Older      10m

[FILTER]
    Name                kubernetes
    Match               kube.*
    Kube_URL            https://kubernetes.default.svc.cluster.local:443
    Merge_Log           Off
    Keep_Log            Off
    K8S-Logging.Exclude Off
    K8S-Logging.Parser  Off

[OUTPUT]
    Name             http
    Match            *
    Host             logs-01.loggly.com
    Port             443
    Tls              On
    URI              /bulk/<token>/tag/testing/
    Format           json_lines
    Json_Date_Key    timestamp
    Json_Date_Format iso8601
    Retry_Limit      False
1

There are 1 best solutions below

0
jonesy On BEST ANSWER

After reading more docs & a lot of trial and error, I came up with the following solution, which works perfectly:

    [SERVICE]
        Flush         1
        Log_Level     info
        Daemon        off
        Parsers_File  custom_parsers.conf
        HTTP_Server   On
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020
        Health_Check  On
    [INPUT]
        Name              tail
        Path              /var/log/containers/*.log
        Exclude_Path      /var/log/containers/fluent*
        DB                /var/log/flb_kube.db
        Tag               kube.*
        Mem_Buf_Limit     512MB
        Skip_Long_Lines   On
        Refresh_Interval  10
        Ignore_Older      10m
    [FILTER]
        Name parser
        Match *
        Key_name log
        Parser custom-parser
    [FILTER]
        Name         parser
        Parser       docker
        Match        *
        Key_Name     log
        Reserve_Data Off
        Preserve_Key Off
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc.cluster.local:443
        Merge_Log           On
        Keep_Log            On
        Merge_Log_Trim      On
        K8S-Logging.Exclude On
        K8S-Logging.Parser  On
    [OUTPUT]
        Name             http
        Match            *
        Host             ${loggly_host}
        Port             443
        Tls              On
        URI              ${loggly_uri}
        Format           json_lines
        Json_Date_Key    timestamp
        Json_Date_Format iso8601
        Retry_Limit      False
    [PARSER]
        Name custom-parser
        Format regex
        Regex ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<log>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z
    [PARSER]
        Name docker
        Format json
        Time_key time
        Time_Format %Y-%m-%dT%H:%M:%S.%L %z

As a bonus, this also resolved another issue where livenessProbe and readinessProbe were failing, causing the fluent-bit pods to be restarted when they timed out.