Skip to content

Instantly share code, notes, and snippets.

@msr-i386
Last active March 11, 2017 23:53
Show Gist options
  • Save msr-i386/ade47a6857cb936fae092fb356e0428e to your computer and use it in GitHub Desktop.
Save msr-i386/ade47a6857cb936fae092fb356e0428e to your computer and use it in GitHub Desktop.
CloudFront log to Apache log converter
#!/usr/bin/env python
# -*- coding: utf-8 -*-
###############################################################################
#
# CloudFront log to Apache log converter
#
# Copyright (C) 2016 MSR All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
###############################################################################
import argparse
import sys
import os
import datetime
def main(argv):
# Usage: cf2apache.py <in_dir>
parser = argparse.ArgumentParser(description='Convert from CloudFront log to Apache log.')
parser.add_argument('in_dir', help='input directory')
args = parser.parse_args()
logs_dir = args.in_dir
if not os.path.exists(logs_dir):
sys.stderr.write('error: directory not found.\n')
return 1
files = os.listdir(logs_dir)
files.sort()
for file in files:
file = os.path.join(logs_dir, file)
for line in open(file, 'r'):
if line[0] == '#':
continue;
# [Amazon CloudFront log format]
# Delimiter: TAB
# date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken
#
# <note>
# + "cs-uri-stem" is not contained queries. If queries exists, must concatenate queries.
# + Referer and User-Agent escapes "%". You must replace from "%25" to "%".
# + User-Agent's blank is "%20". You must replace from "%20" to " ".
line = line.replace("%25", "%")
cols = line.split("\t")
# datetime convert (UTC -> JST)
strdt = cols[0]
strdt += " "
strdt += cols[1]
dt = datetime.datetime.strptime(strdt, "%Y-%m-%d %H:%M:%S")
dt += datetime.timedelta(hours=9)
# [Apache combined log format]
# Delimiter: Blank
# ip - - time request-URI HTTP_status bytes_sent referrer user_agent
# URI + Query concatenation
if cols[11] != "-":
cols[7] += "?" + cols[11]
# User Agent convertion
cols[10] = cols[10].replace("%20", " ")
print cols[4], '- - [' + dt.strftime("%d/%b/%Y:%H:%M:%S") + ' +0900] "' + cols[5], cols[7], 'HTTP/1.0"', cols[8], cols[3], '"' + cols[9] + '" "' + cols[10] + '"'
if __name__ == '__main__':
sys.exit(main(sys.argv))
@msr-i386
Copy link
Author

msr-i386 commented Sep 26, 2016

How to use

  1. Get CloudFront logs (gzip compressed).
  2. Extract CloudFront logs to <in_dir>.
  3. Execute cf2apache.py with argument <in_dir>.

Caution

  • Converted log's timezone is JST (UTC+9).
  • Incompatible fields are ignored. (e.g. x-edge-location)
  • Don't contain this script in <in_dir>. If contains, this script can't convert properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment