Skip to content

Instantly share code, notes, and snippets.

@JCotton1123
Created July 15, 2014 19:04
Show Gist options
  • Select an option

  • Save JCotton1123/88c0d6c9ffd82942a704 to your computer and use it in GitHub Desktop.

Select an option

Save JCotton1123/88c0d6c9ffd82942a704 to your computer and use it in GitHub Desktop.
Parse an apache log into a pipe-delimited file
from __future__ import print_function
import sys
import re
parts = [
r'(?P<host>\S+)', # host %h
r'\S+', # indent %l (unused)
r'(?P<user>\S+)', # user %u
r'\[(?P<time>.+)\]', # time %t
r'"(?P<request>.+)"', # request "%r"
r'(?P<status>[0-9]+)', # status %>s
r'(?P<size>\S+)', # size %b (careful, can be '-')
r'"(?P<referer>.*)"', # referer "%{Referer}i"
r'"(?P<agent>.*)"', # user agent "%{User-agent}i"
]
pattern = re.compile(r'\s+'.join(parts)+r'\s*\Z')
with open(sys.argv[1]) as f:
for line in f:
try:
m = pattern.match(line)
res = m.groupdict()
print("|".join([res['status'],res['request'],res['agent']]))
except:
print("Unable to parse line %s" % line, file=sys.stderr)
f.close()
@JCotton1123
Copy link
Copy Markdown
Author

This may have been borrowed from someone else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment