Skip to content

Instantly share code, notes, and snippets.

@riivo
riivo / gist:9253b3158ed0d9833491
Created November 15, 2014 10:45
finding duplicate files in shell
#comes from http://unix.stackexchange.com/questions/71176/howto-find-duplicate-files-on-disk
find / -type f -exec md5sum {} \; > md5sums
gawk '{print $1}' md5sums | sort | uniq -d > dupes
while read d; do echo "---"; grep $d md5sums | cut -d ' ' -f 2-; done < dupes
@riivo
riivo / gist:5046585
Created February 27, 2013 09:22
Find out which columns have NAs in the data
sapply(DF, function(x)any(is.na(x)))
@riivo
riivo / gist:1216063
Created September 14, 2011 07:51
Shell snippet: Extract timestamp from file, convert to human readable format and count unique dates
head out1.log | awk -F';' '{print "@" $2}' | xargs -L 1 date +"%m-%Y" -d | sort | uniq -c | sort -n
@riivo
riivo / about.md
Created August 9, 2011 20:27 — forked from jasonrudolph/about.md
Programming Achievements: How to Level Up as a Developer
@riivo
riivo / top10 hosts from apache log
Created March 6, 2011 12:07
Find hostnames of top 10 IP-addresses in you web server log file
cat access.log | awk '{print $1}' | sort | uniq -c | sort -r | head | awk '{print $2}' | xargs -L1 host