Skip to content

Instantly share code, notes, and snippets.

@zegomesjf
Forked from urubatan/remove_dups.rb
Created June 18, 2012 16:18
Show Gist options
  • Save zegomesjf/2949188 to your computer and use it in GitHub Desktop.
Save zegomesjf/2949188 to your computer and use it in GitHub Desktop.
Ruby script to remove duplicated files, I created it when migrating my pictures collection from iPhoto to picasa, and merged some independent collections, it created a real mess, and the result of this big mess is this gist.
require 'digest/sha1'
require 'fileutils'
directories = [
"SOURCE DIR 1",
"SOURCE DIR 2"
]
files = {}
directories.each do |dir_name|
puts "Scanning Directory: #{dir_name} "
Dir.glob("#{dir_name}/**/*.*") do |file_name|
unless File.directory?(file_name)
print "."
dig = Digest::SHA1.hexdigest(File.open(file_name,'rb'){|f| f.read })
arr = files[dig] || []
arr << file_name
files[dig] = arr
end
end
puts ""
end
total_files = files.inject(0){|acum,val| acum + val[1].size}
with_copies = files.select{|k,v| v.length > 1 }
puts "#{files.size} different files"
puts "#{with_copies.size} files with copies"
puts "#{total_files = files.size} duplicates"
FileUtils.mkdir_p "CopiesTrash"
with_copies.each do |k,v|
orig = v.pop
puts "moving #{v.length} copie(s) of #{orig} to CopiesTrash"
FileUtils.mv v, "CopiesTrash", :force => true
puts ""
end
puts "Your directories are cleaned up of duplicated files, all the trash is in the CopiesTrash folder"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment