Skip to content

Instantly share code, notes, and snippets.

@GaryRogers
Created June 11, 2015 17:01
Show Gist options
  • Save GaryRogers/1b549b783e909d546eec to your computer and use it in GitHub Desktop.
Save GaryRogers/1b549b783e909d546eec to your computer and use it in GitHub Desktop.

Remove Syslog line headers from multi-line logs in logstash

Overview

Rather than run a log shipper on hosts, we use Syslog when shipping logs out of monolog. This works great for single-line logs. It breaks when a log message gets split up by syslog. When syslog does this, it duplicates the line header, like so:

2015-06-09T05:39:31.457042-05:00 host.example.edu : This is a really really really
2015-06-09T05:39:31.475414-05:00 host.example.edu : really long message

When Logstash combines these lines via the multiline input codec, the resulting message for logtash looks like this:

This is a really really really \n2015-06-09T05:39:31.475414-05:00 host.example.edu : really long message

Not great. The Syslog line header gets embeded in the message. If you've got json in the message it gives subsequent logstash parsers fits, it breaks the json.

So. We need to pull that header out of the message if we see it there. You can do this with mutate and gsub.

mutate { 
  gsub => [ 
    "message", "\n\d+-\d+-\d+T\d+:\d+:\d+\.\d+-\d+:\d+\s+.*?\s+:\s+", ""
  ] 
} 

Put this before your grok filters and things should work better.

One thing to note. gsub doesn't understand patterns in earlier versions of logstash.

mutate { 
  gsub => [ 
    # Doesn't work in early version of logstash
    "message", "\n\%{TIMESTAMP_ISO8601}\s+%{IPORHOST}\s+:\s+", ""
  ] 
} 

You'll have to unwrap those grok patterns on your own in ealier versions of logstash.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment