Defining multiple outputs in Logstash whilst handling potential unavailability of an Elasticsearch instance

1.9k Views Asked by At

I have two outputs configured for Logstash as I need the data to be delivered to two separate Elasticsearch nodes in different locations.

A snippet of the configuration is below (redacted where required):

output {
  elasticsearch {
    hosts => [ "https://host1.local:9200" ]
    cacert => '/etc/logstash/config/certs/ca.crt'
    user => XXXXX
    password => XXXXX
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }
 }

output {
  elasticsearch {
    hosts => [ "https://host2.local:9200" ]
    cacert => '/etc/logstash/config/certs/ca.crt'
    user => XXXXX
    password => XXXXX
    index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
  }
}

During testing I've noticed that if one of the ES instances, host1.local or host2.local is unavailable, Logstash fails to process/deliver data to the other, even though it's available.

Is there a modification I can make to the configuration that will allow data to be delivered to the available Elasticsearch instance, even if the other dies?

2

There are 2 best solutions below

0
Badger On BEST ANSWER

logstash has an at-least-once delivery model. If persistent queues are not enabled data can be lost across a restart, but otherwise, logstash will delivery events to all of the outputs at least once. As a result, if one output becomes unreachable the queue (either in-memory or persistent) will back up and block processing. You can use persistent queues and pipeline-to-pipeline communication with an output isolator pattern to avoid stalling one output when another is unavailable.

0
jjbskir On

To follow up on @Badger's answer, you will need to use the output isolator pattern they suggested. As you observed, when one output is blocked, it prevents the other outputs from functioning. This is also noted in the docs

Logstash, by default, is blocked when any single output is down.

This solution can be used for any output and is not unique to Elasticsearch. Here is a code sample of how it would work for your example.

# config/pipelines.yml
- pipeline.id: intake
  config.string: |
    input { ... }
    output { pipeline { send_to => [es-host1, es-host2] } }
- pipeline.id: buffered-es-host1
  queue.type: persisted
  config.string: |
    input { pipeline { address => es-host1 } }
    output {
      elasticsearch {
      hosts => [ "https://host1.local:9200" ]
      cacert => '/etc/logstash/config/certs/ca.crt'
      user => XXXXX
      password => XXXXX
      index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
    }
  }
- pipeline.id: buffered-es-host2
  queue.type: persisted
  config.string: |
    input { pipeline { address => es-host2 } }
    output {
      elasticsearch {
        hosts => [ "https://host2.local:9200" ]
        cacert => '/etc/logstash/config/certs/ca.crt'
        user => XXXXX
        password => XXXXX
        index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
      }
    }

You can follow this document for running multiple pipelines, but you need to pass into the command line the parameter path.settings with the new config/pipelines.yml file.

bin/logstash --path.settings config/pipelines.yml