Skip to content

Instantly share code, notes, and snippets.

@abdul
Created April 20, 2020 13:46
Show Gist options
  • Select an option

  • Save abdul/cccc9b02696901fd1ee11061d7d07620 to your computer and use it in GitHub Desktop.

Select an option

Save abdul/cccc9b02696901fd1ee11061d7d07620 to your computer and use it in GitHub Desktop.
Find duplicate files -- uses imohash for faster hashing and finding duplicates
#!/bin/bash
# Filename: find_and_move_duplicates.sh
# Description: Find and (re)move duplicate files and
# keep one sample of each file.
gls -lS --time-style=long-iso | awk 'BEGIN {
getline; getline;
name1=$8; size=$5
}
{
name2=$8;
if (size==$5)
{
"imosum "name1 | getline; csum1=$1;
"imosum "name2 | getline; csum2=$1;
if ( csum1==csum2 )
{
print name1; print name2
}
close("imosum "name1)
close("imosum "name2)
};
size=$5; name1=name2;
}' | sort -u > duplicate_files
cat duplicate_files | xargs -I {} imosum {} | sort | guniq -w 32 | gawk '{ print $2 }' | sort -u > unique_files
#echo Removing duplicates
#gcomm duplicate_files unique_files -3 | gtee /dev/stderr | xargs rm
echo Moving to ./duplicates/
#following works fine on OSX; tweak mv command if it doesn't work on your distro
gcomm duplicate_files unique_files -3 | gtee /dev/stderr | xargs -I '{}' mv '{}' duplicates/
echo Moved duplicates files to duplicates directory successfully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment