Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save tarakeshp/fb458893192c889f9a3c59ace141c621 to your computer and use it in GitHub Desktop.
Save tarakeshp/fb458893192c889f9a3c59ace141c621 to your computer and use it in GitHub Desktop.
Logstash and Filebeat in 5 minutes

Logstash and Filebeat in 5 minutes

What/Why?

  • Filebeat is a log shipper, capture files and send to Logstash for processing and eventual indexing in Elasticsearch
  • Logstash is a heavy swiss army knife when it comes to log capture/processing
  • Centralized logging, necessarily for deployments with > 1 server
  • Super-easy to get setup, a little trickier to configure
  • Captured data is easy to visualize with Kibana
  • Wny not just Logstash (ELK is so hot right now)?
    • Logstash is a heavyweight compared to Filebeat, prohibitive to running a swarm of tiny server instances
    • ELK is definitely still part of the stack, but we're adding "beats" to the mix => BELK

Overview

Filebeat capture and ship file logs --> Logstash parse logs into documents --> Elasticsearch store/index documents --> Kibana visualize/aggregate

How?

The tough parts

Getting filebeat and ELK setup was a breeze, but configuring Logstash to process logs correctly was more of a pain...enter GROK and logstash.conf

Logstash.conf

logstash.conf has 3 sections -- input / filter / output, simple enough, right?

Input section

In this case, the "input" section of the logstash.conf has a port open for Filebeat using the lumberjack protocol (any beat type should be able to connect):

input
{
    beats
    {
        ssl => false
        port => 5043
    }
}

Filter

This is where things get tricky. "Filter" does the log parsing, primarily using "GROK" patterns.

filter
{
    if [type] == "nginx_error" {
        grok {
            match => { "message" => "%{DATESTAMP:timestamp} \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: %{IPORHOST:client})(?:, server: %{IPORHOST:server})(?:, request: %{QS:request})(?:, host: %{QS:host})(?:, referrer: \"%{URI:referrer}\")" }
        }
    }

    # Using a custom nginx log format that also includes the request duration and X-Forwarded-For http header as "end_user_ip"
    if [type] == "nginx_access" {
        grok {
            match => { "message" => "%{COMBINEDAPACHELOG}+ %{NUMBER:request_length} %{NUMBER:request_duration} (%{IPV4:end_user_ip}|-)" }
        }

        geoip {
            source => "end_user_ip"
        }

        mutate {
            convert => {
                "request_duration" => "float"
            }
        }
    }
}

Output

Pretty simple. NOTE: you can specify the "beat" @metadata parameter via the "index" in your filebeat configuration, making things like separating dev/prod logs into separate instances easy

output
{

    elasticsearch
    {
        hosts => ["127.0.0.1:9200"]
        sniffing => true
        manage_template => true
        index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
        document_type => "%{[@metadata][type]}"
    }
}

What's next...

Multiline patterns are the way to go when capturing exception information and stack traces,

Another similar system, Metricbeat, looks to be an awesome complement to Filebeat and an alternative to CloudWatch when it comes to system-level metrics, personally, I'm going to dig into this next as the granularity of metrics for each application/system is pretty extensive via Metricbeat's modules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment