Skip to content

Instantly share code, notes, and snippets.

@masayuki5160
Created July 23, 2013 00:53
Show Gist options
  • Save masayuki5160/6059017 to your computer and use it in GitHub Desktop.
Save masayuki5160/6059017 to your computer and use it in GitHub Desktop.
Hadoop streaming用のmapper アクセスログからリファラーとかきりだし。
#!/usr/bin/ruby
# アクセスログ解析用 mapper
accessLog = Hash.new
tmpLink = Array.new
tmpReferer = Array.new
#アクセスログを分割してReducerに渡す形式にする
ARGF.each do |log|
#末尾の改行コードを削除
log.chomp!
#ださいけどこれで遷移先のアクションを取得
/&/ =~ log
/action=/ =~ $`
tmpLink = $'
if tmpLink.nil? then
tmpLink= 'none'
end
#これもださいけどrefererからアクションを取得
/http:/ =~ log
/&/ =~ $'
/action=/ =~ $`
tmpReferer = $'
if tmpReferer.nil? then
tmpReferer = 'none'
end
# Reducerに渡すように形式を整えて出力
puts "#{tmpReferer}\t#{tmpLink}"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment