Skip to content

Instantly share code, notes, and snippets.

@meilinger
Last active September 6, 2024 12:56
Show Gist options
  • Save meilinger/19f33179a0d3d94979b32fb3866f90c5 to your computer and use it in GitHub Desktop.
Save meilinger/19f33179a0d3d94979b32fb3866f90c5 to your computer and use it in GitHub Desktop.
Logstash and Filebeat in 5 minutes

Logstash and Filebeat in 5 minutes

What/Why?

  • Filebeat is a log shipper, capture files and send to Logstash for processing and eventual indexing in Elasticsearch
  • Logstash is a heavy swiss army knife when it comes to log capture/processing
  • Centralized logging, necessarily for deployments with > 1 server
  • Super-easy to get setup, a little trickier to configure
  • Captured data is easy to visualize with Kibana
  • Wny not just Logstash (ELK is so hot right now)?
    • Logstash is a heavyweight compared to Filebeat, prohibitive to running a swarm of tiny server instances
    • ELK is definitely still part of the stack, but we're adding "beats" to the mix => BELK

Overview

Filebeat capture and ship file logs --> Logstash parse logs into documents --> Elasticsearch store/index documents --> Kibana visualize/aggregate

How?

The tough parts

Getting filebeat and ELK setup was a breeze, but configuring Logstash to process logs correctly was more of a pain...enter GROK and logstash.conf

Logstash.conf

logstash.conf has 3 sections -- input / filter / output, simple enough, right?

Input section

In this case, the "input" section of the logstash.conf has a port open for Filebeat using the lumberjack protocol (any beat type should be able to connect):

input
{
    beats
    {
        ssl => false
        port => 5043
    }
}

Filter

This is where things get tricky. "Filter" does the log parsing, primarily using "GROK" patterns.

filter
{
    if [type] == "nginx_error" {
        grok {
            match => { "message" => "%{DATESTAMP:timestamp} \[%{LOGLEVEL:severity}\] %{POSINT:pid}#%{NUMBER}: %{GREEDYDATA:errormessage}(?:, client: %{IPORHOST:client})(?:, server: %{IPORHOST:server})(?:, request: %{QS:request})(?:, host: %{QS:host})(?:, referrer: \"%{URI:referrer}\")" }
        }
    }

    # Using a custom nginx log format that also includes the request duration and X-Forwarded-For http header as "end_user_ip"
    if [type] == "nginx_access" {
        grok {
            match => { "message" => "%{COMBINEDAPACHELOG}+ %{NUMBER:request_length} %{NUMBER:request_duration} (%{IPV4:end_user_ip}|-)" }
        }

        geoip {
            source => "end_user_ip"
        }

        mutate {
            convert => {
                "request_duration" => "float"
            }
        }
    }
}

Output

Pretty simple. NOTE: you can specify the "beat" @metadata parameter via the "index" in your filebeat configuration, making things like separating dev/prod logs into separate instances easy

output
{

    elasticsearch
    {
        hosts => ["127.0.0.1:9200"]
        sniffing => true
        manage_template => true
        index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
        document_type => "%{[@metadata][type]}"
    }
}

What's next...

Multiline patterns are the way to go when capturing exception information and stack traces,

Another similar system, Metricbeat, looks to be an awesome complement to Filebeat and an alternative to CloudWatch when it comes to system-level metrics, personally, I'm going to dig into this next as the granularity of metrics for each application/system is pretty extensive via Metricbeat's modules.

@dhenson02
Copy link

lol @ "BELK" 💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment