Created
December 6, 2013 14:16
-
-
Save deckerego/7824771 to your computer and use it in GitHub Desktop.
If you consider aberrant traffic hit rates ones that are more or equal to two standard deviations away from the mean
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Massage your data into a data frame that provides access by Hour and URI | |
traffic.df <- parse.log("access.log") #parse.log is left as an exercise for the reader | |
# Aggregate | |
uri.hits <- ddply(traffic.df, .(Hour, URI), summarise, Hits=length(URI), .parallel = TRUE) | |
uri.stats <- ddply(uri.hits, .(URI), summarise, Mean=mean(Hits), Variance=sd(Hits), Total=sum(Hits), .parallel = TRUE) | |
uri.stats <- join(uri.hits, uri.stats, c("URI")) | |
# Find two std dev away from mean | |
uri.bad <- subset(uri.stats, Variance > 0) | |
uri.bad$Deviations <- (uri.bad$Hits - uri.bad$Mean) / uri.bad$Variance | |
uri.bad <- subset(uri.bad, Deviations >= 2) | |
uri.bad <- uri.bad[with(uri.bad, order(-Deviations)), ] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment