Rather than run a log shipper on hosts, we use Syslog when shipping logs out of monolog. This works great for single-line logs. It breaks when a log message gets split up by syslog. When syslog does this, it duplicates the line header, like so:
2015-06-09T05:39:31.457042-05:00 host.example.edu : This is a really really really
2015-06-09T05:39:31.475414-05:00 host.example.edu : really long message
When Logstash combines these lines via the multiline input codec, the resulting message for logtash looks like this:
This is a really really really \n2015-06-09T05:39:31.475414-05:00 host.example.edu : really long message
Not great. The Syslog line header gets embeded in the message. If you've got json in the message it gives subsequent logstash parsers fits, it breaks the json.
So. We need to pull that header out of the message if we see it there. You can do this with mutate and gsub.
mutate {
gsub => [
"message", "\n\d+-\d+-\d+T\d+:\d+:\d+\.\d+-\d+:\d+\s+.*?\s+:\s+", ""
]
}
Put this before your grok filters and things should work better.
One thing to note. gsub
doesn't understand patterns in earlier versions of logstash.
mutate {
gsub => [
# Doesn't work in early version of logstash
"message", "\n\%{TIMESTAMP_ISO8601}\s+%{IPORHOST}\s+:\s+", ""
]
}
You'll have to unwrap those grok patterns on your own in ealier versions of logstash.