Skip to content

Instantly share code, notes, and snippets.

@mattgaidica
Created June 7, 2012 22:11
Show Gist options
  • Save mattgaidica/2891945 to your computer and use it in GitHub Desktop.
Save mattgaidica/2891945 to your computer and use it in GitHub Desktop.
Comparing two files via MD5 hash on Amazon S3 using Ruby
require 'digest/md5'
require 'aws/s3'
#set your AWS credentials
AWS::S3::Base.establish_connection!(
:access_key_id => 'XXX',
:secret_access_key => 'XXX'
)
#get the S3 file (object)
object = AWS::S3::S3Object.find('02185773dcb5a468df6b.pdf', 'your_bucket')
#separate the etag object, and remove the extra quotations
etag = object.about['etag'].gsub('"', '')
#get the local file
f = '/Users/matt/Desktop/02185773dcb5a468df6b.pdf'
digest = Digest::MD5.hexdigest(File.read(f))
#lets see them both
puts digest + ' vs ' + etag
#a string comparison to finish it off
if digest.eql? etag
puts 'same file!'
else
puts 'different files.'
end
@nictrix
Copy link

nictrix commented Oct 17, 2013

Thanks this was helpful, I didn't know where the etag value was located in the object.

However, I did run into a problem with memory usage, if it's a large file you may want to use this instead:

digest = Digest::MD5.file(f).to_s

Keeps your ruby memory usage from growing exponentially

@Dan2552
Copy link

Dan2552 commented Feb 3, 2015

etag doesn't appear to always use the md5:

- (String) etag

Returns the object's ETag.

Generally the ETAG is the MD5 of the object. If the object was uploaded using multipart upload then this is the MD5 all of the upload-part-md5s.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment