Created
September 24, 2014 17:55
-
-
Save tclancy/e0e71866b731b06b87bf to your computer and use it in GitHub Desktop.
Storing regular expression for parsing our Nginx log format after a bunch of iterations
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
nginx_line = r'(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})?(?P<ip2>, \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})?-? - ?\S* \[(?P<timestamp>\d{2}\/\w{3}\/\d{4}:\d{2}:\d{2}:\d{2} (\+|\-)\d{4})\]\s+\"(?P<method>\S{3,10}) (?P<path>\S+) HTTP\/1\.\d" (?P<response_status>\d{3}) (?P<bytes>\d+) "(?P<referer>(\-)|(.+))?" "(?P<useragent>.+)' | |
""" | |
Matches request lines from Nginx log format | |
'$remote_addr - $remote_user [$time_local] "$request" ' | |
'$status $body_bytes_sent "$http_referer" ' | |
'"$http_user_agent" "$request_time" "$upstream_response_time" "$http_x_forwarded_for" ' | |
'"$http_client_ip" "$http_x_real_ip"'; | |
""" |
Doesn't match quit requests, but I don't care.
- - - [12/Aug/2014:13:03:16 +0000] "quit" 400 172 "-" "-" 0.130 - .
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Do note the user agent bit at the end actually captures everything to the end of the line because I did not care about the UAs and wanted to be liberal in what I matched. Look for a quote to end the UA if you need that info.