Created
June 7, 2012 22:11
-
-
Save mattgaidica/2891945 to your computer and use it in GitHub Desktop.
Comparing two files via MD5 hash on Amazon S3 using Ruby
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'digest/md5' | |
require 'aws/s3' | |
#set your AWS credentials | |
AWS::S3::Base.establish_connection!( | |
:access_key_id => 'XXX', | |
:secret_access_key => 'XXX' | |
) | |
#get the S3 file (object) | |
object = AWS::S3::S3Object.find('02185773dcb5a468df6b.pdf', 'your_bucket') | |
#separate the etag object, and remove the extra quotations | |
etag = object.about['etag'].gsub('"', '') | |
#get the local file | |
f = '/Users/matt/Desktop/02185773dcb5a468df6b.pdf' | |
digest = Digest::MD5.hexdigest(File.read(f)) | |
#lets see them both | |
puts digest + ' vs ' + etag | |
#a string comparison to finish it off | |
if digest.eql? etag | |
puts 'same file!' | |
else | |
puts 'different files.' | |
end |
etag
doesn't appear to always use the md5:
- (String) etag
Returns the object's ETag.
Generally the ETAG is the MD5 of the object. If the object was uploaded using multipart upload then this is the MD5 all of the upload-part-md5s.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks this was helpful, I didn't know where the etag value was located in the object.
However, I did run into a problem with memory usage, if it's a large file you may want to use this instead:
Keeps your ruby memory usage from growing exponentially