Multiple output settings in Logstash same as Fluentd forest + copy

In Logstash, try setting the same as Fluentd (td-agent) forest plugin and copy combined. As a result, even if the log type and the sender increase, it is possible to simplify without adding the output setting every time.

What to expect - Setting example with Fluentd
Example of setting with Logstash
- Output result
Conclusion - Multiple output settings in Logstash same as Fluentd forest + copy

What to expect - Setting example with Fluentd

In Fluentd (td - agent), make general setting to process with forest based on tag variable. An example of setting in Fluentd is as follows. Optimization such as Chunk, Buffer etc is not implemented yet.

Base setting. Include individual configuration files. The verification version is td-agent 0.12.31.

/etc/td-agent/td-agent.conf

<source>
  @type forward
  port 24224
</source>

@include ./conf/*.conf

Simply capture the local log file. Use the tag set here for Elasticsearch's index etc.

/etc/td-agent/conf/local_messages.conf

<source>
  @type tail
  path /var/log/messages
  pos_file /var/log/messages.pos
  tag "sys.messages.#{Socket.gethostname}"
  format syslog
</source>

Sending settings to Elasticsearch. Save it as a file for backup / long-term storage. By using forest + copy, tag can be taken as a variable and used for index and file name.

/etc/td-agent/conf/elasticsearch.conf

<match *.*.**>
  type forest
  subtype copy
  <template>
    <store>
      @type elasticsearch
      host localhost
      port 9200
      logstash_format true
      logstash_prefix ${tag_parts[0]}.${tag_parts[1]}
      type_name ${tag_parts[0]}
      flush_interval 20
    </store>
    <store>
      @type file
      path /var/log/td-agent/${tag_parts[0]}/${tag_parts[1]}.log
      compress gzip
    </store>
  </template>
</match>

Here is the forest plugin.

This makes it unnecessary to add additional settings in the match directive even if the new log type increases. It is efficient when collecting and aggregating multiple logs from multiple servers.

Example of setting with Logstash

The version of Logstash is tested at 5.3.0.

$ /usr/share/logstash/bin/logstash --version
logstash 5.3.0

Make sure that path.config: /etc/logstash/conf.d is set in/etc/logstash/logstash.yml. (By default)

Forest + Copy in Logstash has the following settings. It is described separately for each part, but you can group the setting as /etc/logstash/conf.d/messages.conf into one file.

Add tags in input and distribute it with tag like Fluentd. The part corresponding to the source directive. Since it refers to environment variables with %{host} it is unnecessary to change. Although tag is used along Fluentd, other fields such as id and type can also be used.

input {
  file {
    path => "/var/log/messages"
    tags => ["sys", "logstash_messages", "%{host}"]
    type => syslog
  }
}

Use index filtering and date processing with regular expression in filter. In Fluentd, it corresponds to format,date_format in the Source directive. Because it operates as a single filter, it is applied to multiple logs captured by input. Therefore, filter targets are limited by if.

filter {
  if [type] == "syslog" {
    grok {
      match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}" }
      add_field => [ "received_at", "%{@timestamp}" ]
      add_field => [ "received_from", "%{host}" ]
    }
    date {
      match => [ "syslog_timestamp", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
    }
  }
}

Output to elasticsearch and file with output. It corresponds to Fluentd's match directive. Logstash does not use plugins such as copy and forest, it can simply use multiple output and variable in output. Because it is a sample when Elasticsearch is running on the same server, change hosts according to the environment.

Since it does not specify <match <tag1>.<tag2>.*>, It is necessary to standardize the output format.

output {
  elasticsearch {
    hosts => ["localhost:9200"]
    index => "%{tags[0]}.%{tags[1]}-%{+YYYY.MM.dd}"
  }
  file {
    path => "/var/log/logstash/%{tags[0]}/%{tags[1]}.log"
  }
}

Output result

The tags variable is expanded and used as index.

$ curl http://localhost:9200/sys.logstash_messages-*
{"sys.logstash_messages-2017.04.09":{"aliases":{},"mappings":{"syslog":{"properties":{"@timestamp":{"type":"date"},"@version":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"host":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"message":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"path":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"received_at":{"type":"date"},"received_from":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"syslog_hostname":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"syslog_message":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"syslog_program":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"syslog_timestamp":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"tags":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"type":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}}},"settings":{"index":{"creation_date":"1491720002393","number_of_shards":"5","number_of_replicas":"1","uuid":"AO_MMCiqQYqcNIIEG8XmQg","version":{"created":"5020299"},"provided_name":"sys.logstash_messages-2017.04.09"}}}}

Similarly to the file output, the tags variable was expanded, and a directory and a log file were created based on the tag information.

$ tree /var/log/logstash/
/var/log/logstash/
├── logstash-plain.log
└── sys
    └── logstash_messages.log

Conclusion - Multiple output settings in Logstash same as Fluentd forest + copy

In Logstash we made the same setting as Fluentd's forest + copy. As a result, even if the log type and the sender increase, it is possible to simplify without adding the output setting every time.