Created
May 23, 2010 12:44
-
-
Save kakra/410905 to your computer and use it in GitHub Desktop.
Simple proof-of-concept of file deduplication through hardlinks: Generates a script of ln command lines to run in bash
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/env ruby | |
# | |
# This does something similar as | |
# | |
# $ find <path> -type f -print0 | xargs -0 md5sum | sort | uniq --all-repeated=separate -w32 | |
# | |
# but dumps a collection of ln command lines to execute. | |
require 'md5' | |
def checksums(dir) | |
files = Dir[File.join(dir, "**", "*")].select {|name| !File.symlink?(name) && File.file?(name) } | |
files.map {|fname| [fname, MD5.new(File.open(fname, "rb").read).to_s] } | |
end | |
def merger(*checksums) | |
checksums.inject({}) do |all_checksums,file_checksums| | |
file_checksums.each do |fname,sum| | |
all_checksums.merge!({ sum => [fname] }) {|k,o,n| o | n } | |
end | |
all_checksums | |
end | |
end | |
def hardlinker(merged_checksums) | |
merged_checksums.each do |sum,files| | |
next if files.count < 2 | |
src = files.shift | |
files.each {|dst| puts "ln -fn '#{src}' '#{dst}'" } | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment