Created
May 24, 2012 14:42
-
-
Save urubatan/2781970 to your computer and use it in GitHub Desktop.
Ruby script to remove duplicated files, I created it when migrating my pictures collection from iPhoto to picasa, and merged some independent collections, it created a real mess, and the result of this big mess is this gist.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'digest/sha1' | |
require 'fileutils' | |
directories = [ | |
"SOURCE DIR 1", | |
"SOURCE DIR 2" | |
] | |
files = {} | |
directories.each do |dir_name| | |
puts "Scanning Directory: #{dir_name} " | |
Dir.glob("#{dir_name}/**/*.*") do |file_name| | |
unless File.directory?(file_name) | |
print "." | |
dig = Digest::SHA1.hexdigest(File.open(file_name,'rb'){|f| f.read }) | |
arr = files[dig] || [] | |
arr << file_name | |
files[dig] = arr | |
end | |
end | |
puts "" | |
end | |
total_files = files.inject(0){|acum,val| acum + val[1].size} | |
with_copies = files.select{|k,v| v.length > 1 } | |
puts "#{files.size} different files" | |
puts "#{with_copies.size} files with copies" | |
puts "#{total_files = files.size} duplicates" | |
FileUtils.mkdir_p "CopiesTrash" | |
with_copies.each do |k,v| | |
orig = v.pop | |
puts "moving #{v.length} copie(s) of #{orig} to CopiesTrash" | |
FileUtils.mv v, "CopiesTrash", :force => true | |
puts "" | |
end | |
puts "Your directories are cleaned up of duplicated files, all the trash is in the CopiesTrash folder" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I think the right code in line 25 might be: puts "#{total_files - files.size} duplicates"
Very usefull code. It helped me a lot. Thank's