benjaminblack · September 22, 2015 14:16
diff --git a/counting-unique-404s-in-apache-log-files b/counting-unique-404s-in-apache-log-files
 Tl;dr:

 gunzip --to-stdout logfile-*.gz | grep " 404 " | cut -d " " -f 7 | sort | uniq -c | sort --numeric-sort --reverse > unique-404s.txt


 Starting with a bunch of gzipped log files, like "logfile-*.gz".

 First, uncompress to stdout:

  gunzip --to-stdout logfile-*.gz
  
 Reduce output to just 404s (the 6th field):

  | grep " 404 "

 Extract the request path:

  | cut -d " " -f 7

 Sort lexicographically, to group identical requests:

  | sort

 Reduce duplicate lines to one line, and prefix with duplicate count:

  | uniq -c

 Sort again, numerically and in reverse order, to produce a descending list of unique 404s:

  | sort --numeric-sort --reverse
  
 And finally, redirect to a file:

  > unique-404s.txt
  
 All together then:

  gunzip --to-stdout logfile-*.gz | grep " 404 " | cut -d " " -f 7 | sort | uniq -c | sort --numeric-sort --reverse > unique-404s.txt
	Tl;dr:

	gunzip --to-stdout logfile-*.gz \| grep " 404 " \| cut -d " " -f 7 \| sort \| uniq -c \| sort --numeric-sort --reverse > unique-404s.txt


	Starting with a bunch of gzipped log files, like "logfile-*.gz".

	First, uncompress to stdout:

	gunzip --to-stdout logfile-*.gz

	Reduce output to just 404s (the 6th field):

	\| grep " 404 "

	Extract the request path:

	\| cut -d " " -f 7

	Sort lexicographically, to group identical requests:

	\| sort

	Reduce duplicate lines to one line, and prefix with duplicate count:

	\| uniq -c

	Sort again, numerically and in reverse order, to produce a descending list of unique 404s:

	\| sort --numeric-sort --reverse

	And finally, redirect to a file:

	> unique-404s.txt

	All together then:

	gunzip --to-stdout logfile-*.gz \| grep " 404 " \| cut -d " " -f 7 \| sort \| uniq -c \| sort --numeric-sort --reverse > unique-404s.txt