Skip to content

Instantly share code, notes, and snippets.

@eliasah
Created December 22, 2016 09:56
Show Gist options
  • Save eliasah/b225a25b08ec39d4586188e9df7e9f3b to your computer and use it in GitHub Desktop.
Save eliasah/b225a25b08ec39d4586188e9df7e9f3b to your computer and use it in GitHub Desktop.
Find duplicate files based on MD5 signatures.
#!/bin/bash
# usage dupes location
if [ "$#" -ne 2 ]; then
echo "Usage : dupes location type (pdf,gz)"
exit
fi
LOCATION=$(readlink -f $1)
TYPE=$2
FILES_LIST="list.txt"
DUPES_LIST="duplicates.txt"
TMP_FILE="tmp.txt"
echo "Finding PDFs and getting MD5SUM"
find "$LOCATION" -name "*.$TYPE" -exec md5sum {} \; > $FILES_LIST
echo "Findind Dupes"
cat $FILES_LIST | sed 's/\s\+/ /g' | cut -d' ' -f1 | sort | uniq -d | sed 's/\s\+/ /g' | cut -d' ' -f3 > $TMP_FILE
cat $TMP_FILE | xargs -I {} grep {} $FILES_LIST > $DUPES_LIST
cat $DUPES_LIST
rm $FILES_LIST $DUPES_LIST $TMP_FILE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment