Last active
September 22, 2015 14:16
-
-
Save benjaminblack/3958a1de0424a8949046 to your computer and use it in GitHub Desktop.
Counting unique 404s in Apache log files
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Tl;dr: | |
gunzip --to-stdout logfile-*.gz | grep " 404 " | cut -d " " -f 7 | sort | uniq -c | sort --numeric-sort --reverse > unique-404s.txt | |
Starting with a bunch of gzipped log files, like "logfile-*.gz". | |
First, uncompress to stdout: | |
gunzip --to-stdout logfile-*.gz | |
Reduce output to just 404s (the 6th field): | |
| grep " 404 " | |
Extract the request path: | |
| cut -d " " -f 7 | |
Sort lexicographically, to group identical requests: | |
| sort | |
Reduce duplicate lines to one line, and prefix with duplicate count: | |
| uniq -c | |
Sort again, numerically and in reverse order, to produce a descending list of unique 404s: | |
| sort --numeric-sort --reverse | |
And finally, redirect to a file: | |
> unique-404s.txt | |
All together then: | |
gunzip --to-stdout logfile-*.gz | grep " 404 " | cut -d " " -f 7 | sort | uniq -c | sort --numeric-sort --reverse > unique-404s.txt | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment