Skip to content

Instantly share code, notes, and snippets.

@mikepea
Created November 5, 2014 12:11
Show Gist options
  • Save mikepea/443d1e07c430d447fd73 to your computer and use it in GitHub Desktop.
Save mikepea/443d1e07c430d447fd73 to your computer and use it in GitHub Desktop.
Logstash Best Practises WIP

Logstash Best Practises

Fix at the source first if you can

  • Log in json_event format where possible
  • Add as much markup information as possible, it's easier when you have context.

Generally we need to

Use the 'type' field as specifically as possible:

  • nginx_access
  • nginx_error
  • sensu_error_message
  • syslog_rfc
  • syslog_nonstandard

Add fields, ideally at the source, which inform what logged the message

In nginx logs, need a field that tells you which application. Not good enough to rely on tags (as we cannot do statsd output on that)

Time-box what events are send to systems that deal with 'the right now'

Logstash is great for firing events into alerting and metric collection systems - systems that kind of rely on the fact that the event stream is current.

However, in the event of backlogs or event reprocessing, you do not want a splurge of old data hitting your event/metric systems, giving false positive results.

Instead, do a freshness test on each event before sending, eg:

output {
  if @timestamp { 
    statsd { }
  }
}

Think about what happens when your resulting events are replayed

Do not mutate or replace @message/message or any other original fields -- you will likely lose the original copy of your message, and reloads will not work.

Take care with non-idempotent operations:

  • adding a new field. Check that the field is not already present first.
  • mutating a key value. Is it mutated from a source that remains?
  • deleting keys -- do you really need to?

Be very careful about rewriting what @message/messaage is, if you subsequently break it up into other fields.

Do not use 'kv' filter without specifying a prefix.

You can easily overwrite extremely important fields, like 'type'.

Prefix should be something like 'syslog_apparmor_', based around what conditionals are used to apply the filter.

Agree on key name and content for some fundamental top-level fields

If you know what is going to be in a field, then you can start to do some interesting things with correlation.

Similarly, if you can be sure that a field value is always of a specific form/data-type, you can adjust the ES schema to enrich the data - particularly with numeric values.

Try to avoid key names that are ambiguous or likely to collide, eg use 'http_status_code' rather than 'status'.

  • http_status_code -- always an integer
  • request_uri
  • ssl_version

As mentioned above, do not remove the fields that were used to create these common fields if you are dynamically generating them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment