Skip to content

Instantly share code, notes, and snippets.

@maxkandler
Last active October 8, 2021 07:31
Show Gist options
  • Save maxkandler/35ba3a9a54cf976d4c9e2defb7288531 to your computer and use it in GitHub Desktop.
Save maxkandler/35ba3a9a54cf976d4c9e2defb7288531 to your computer and use it in GitHub Desktop.
Grok filter for Cloudfront Logs to be used with Logstash & ElasticSearch
filter {
grok {
match => ["message", "%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}[ \t]%{TIME:time}[ \t]%{DATA:x_edge_location}[ \t](?:%{NUMBER:sc_bytes}|-)[ \t]%{IP:c_ip}[ \t]%{WORD:cs_method}[ \t]%{HOSTNAME:cs_host}[ \t]%{NOTSPACE:cs_uri_stem}[ \t]%{NUMBER:sc_status}[ \t]%{GREEDYDATA:referrer}[ \t]%{NOTSPACE:user_agent}[ \t]%{GREEDYDATA:cs_uri_query}[ \t]%{NOTSPACE:cookie}[ \t]%{WORD:x_edge_result_type}[ \t]%{NOTSPACE:x_edge_request_id}[ \t]%{HOSTNAME:x_host_header}[ \t]%{URIPROTO:cs_protocol}[ \t]%{INT:cs_bytes}[ \t]%{NUMBER:time_taken}[ \t]%{NOTSPACE:x_forwarded_for}[ \t]%{NOTSPACE:ssl_protocol}[ \t]%{NOTSPACE:ssl_cipher}[ \t]%{NOTSPACE:x_edge_response_result_type}([ \t])?(%{NOTSPACE:cs_protocol_version})?"]
}
geoip {
source => "c_ip"
}
mutate {
add_field => ["listener_timestamp", "%{year}-%{month}-%{day} %{time}"]
convert => {
"[geoip][coordinates]" => "float"
"sc_bytes" => "integer"
"cs_bytes" => "integer"
"time_taken" => "float"
}
}
date {
match => ["listener_timestamp", "yyyy-MM-dd HH:mm:ss"]
}
}
@teebu
Copy link

teebu commented Nov 1, 2017

WORD:x_edge_location sometimes fails, because it has dash, like: MIA3-C1. I switched to DATA field

@maxkandler
Copy link
Author

@teebu Thanks for pointing that out.

@binary111
Copy link

For me it worked when I have replaced [ \t] by %{SPACE}% as shown below

%{YEAR:year}-%{MONTHNUM:month}-%{MONTHDAY:day}%{SPACE}%{TIME:time}%{SPACE}(?<x_edge_location>\b[\w-]+\b)%{SPACE}(?:%{NUMBER:sc_bytes}|-)%{SPACE}%{IPORHOST:clientip}%{SPACE}%{WORD:cs_method}%{SPACE}%{HOSTNAME:cs_host}%{SPACE}%{NOTSPACE:cs_uri_stem}%{SPACE}%{NUMBER:sc_status}%{SPACE}%{GREEDYDATA:referrer}%{SPACE}%{GREEDYDATA:agent}%{SPACE}%{GREEDYDATA:cs_uri_query}%{SPACE}%{GREEDYDATA:cookies}%{SPACE}%{WORD:x_edge_result_type}%{SPACE}%{NOTSPACE:x_edge_request_id}%{SPACE}%{HOSTNAME:x_host_header}%{SPACE}%{GREEDYDATA:cs_protocol}%{SPACE}%{INT:cs_bytes}%{SPACE}%{GREEDYDATA:time_taken}%{SPACE}%{GREEDYDATA:x_forwarded_for}%{SPACE}%{GREEDYDATA:ssl_protocol}%{SPACE}%{GREEDYDATA:ssl_cipher}%{SPACE}%{GREEDYDATA:x_edge_response_result_type}%{SPACE}%{GREEDYDATA:cs_protocol_version}

It is better to test the sample on https://grokdebug.herokuapp.com/ first.

@stevebanik
Copy link

For me, the location does not appear to be an array of coordinates, and I'm getting "No Compatible Fields: The "cloudfront-*" index pattern does not contain any of the following field types: geo_point" when I try to create a new visualization. So, it's not being correctly handled as a geo_point data type as far as I can tell. I'm starting to think my default Logstash template is missing the geoip block, or it exists but is incorrect.

cloudfront_visualization

@jmcazaux
Copy link

Hi there, anyone tried to update this with the new Cloudfront log format?
I have been struggling for the last 2 hours, but everything I try leads to a no-match...

@hmoffatt
Copy link

Hi there, anyone tried to update this with the new Cloudfront log format?
I have been struggling for the last 2 hours, but everything I try leads to a no-match...

Try the pattern from logstash-plugins/logstash-patterns-core#232 (comment)

@Tarasovych
Copy link

%{DATE_EU:date}\t%{TIME:time}\t(?<x_edge_location>\b[\w\-]+\b)\t(?:%{NUMBER:sc_bytes:int}|-)\t%{IPORHOST:c_ip}\t%{WORD:cs_method}\t%{HOSTNAME:cs_host}\t%{NOTSPACE:cs_uri_stem}\t%{NUMBER:sc_status:int}\t%{GREEDYDATA:referrer}\t%{GREEDYDATA:User_Agent}\t%{GREEDYDATA:cs_uri_stem}\t%{GREEDYDATA:cookies}\t%{WORD:x_edge_result_type}\t%{NOTSPACE:x_edge_request_id}\t%{HOSTNAME:x_host_header}\t%{URIPROTO:cs_protocol}\t%{INT:cs_bytes:int}\t%{NUMBER:time_taken}\t%{DATA:x_forwarded_for}\t%{DATA:ssl_protocol}\t%{DATA:ssl_cipher}\t%{DATA:x_edge_response_result_type}\tHTTP/%{NUMBER:cs_protocol_version}\t%{DATA:fle_status}\t%{DATA:fle_encrypted_fields}\t%{DATA:c_port}\t%{NUMBER:time_to_first_byte}\t%{DATA:x_edge_detailed_result_type}\t%{DATA:sc_content_type}\t%{DATA:sc_content_len}\t%{DATA:sc_range_start}\t%{GREEDYDATA:sc_range_end}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment