Created
January 28, 2011 13:48
-
-
Save blambeau/800276 to your computer and use it in GitHub Desktop.
Relational algebra on apache logs, thanks to veritas
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'veritas/physical/logs' | |
file = File.expand_path('../access.log', __FILE__) | |
LOGS = Veritas::Physical::Logs.new(file, [:apache, :combined]) | |
# How many hits per page ?? | |
(debug (summarize LOGS, :path, :count => (count '*'))) | |
# What pages have not been found ?? | |
NOT_FOUND = (restrict LOGS, ->(t){ t[:http_status].eq(404) }) | |
(debug (project NOT_FOUND, :path)) | |
# How many times each ?? | |
(debug (summarize NOT_FOUND, :path, :count => (count '*'))) | |
# Who are the robots ?? | |
ROBOT_AGENTS = (project (restrict LOGS, ->(t){ t[:user_agent].match(/[Bb]ot/) }), :user_agent) | |
(debug ROBOT_AGENTS) | |
# Or should it be requesters of 'robots.txt' ?? | |
ROBOT_REQUESTERS = (project (restrict LOGS, ->(t){ t[:path].match(/robots.txt/) }), :user_agent) | |
(debug ROBOT_REQUESTERS) | |
# Which robots are not named 'bot' ?? | |
(debug (minus ROBOT_REQUESTERS, ROBOT_AGENTS)) | |
# What are logs if robots are not taken into account ?? | |
INTERESTING_LOGS = (minus LOGS, (join LOGS, ROBOT_REQUESTERS)) | |
# How many hits per page, not counting robots ?? | |
(debug (summarize INTERESTING_LOGS, :path, :count => (count '*'))) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment