Skip to content

Instantly share code, notes, and snippets.

@aerostitch
Last active August 29, 2015 14:07
Show Gist options
  • Save aerostitch/538eda0b2d1d8dd914bf to your computer and use it in GitHub Desktop.
Save aerostitch/538eda0b2d1d8dd914bf to your computer and use it in GitHub Desktop.
This little gist calculates the md5sum as the aws/s3 does for the etag property of AWS::S3::S3Object (either using multipart version or not).
#!/usr/bin/env ruby
require 'digest/md5'
##
# This little gist calculates the md5sum as the aws/s3 does.
# Either use a non-zero chunk size to specify the chunk size used while
# uploading the file, or just don't specify it if you didn't do
# a multipart upload.
#
# To have an example of how to get an aws etag, see:
# https://gist.github.com/aerostitch/22992e88315215f100b8
#
# You can also try to automatically guess the chunk size using the following
# gist: https://gist.github.com/aerostitch/702c9d01b293a9482e1d
#
# You can also check if the local file has the right checksum without knowing
# about the chunk size with this function: https://gist.github.com/aerostitch/a27654b1082dd2a110bf
#
# == Parameters:
# file_name::
# Full local path of the file you want the etag of.
#
# chunk_size::
# Size of the chunk you used when doing a multipart upload.
# Just don't specify it if you didn't do a multipart upload.
#
# == Returns:
# The returned value will be the value of the "etag" property of
# an AWS::S3::S3Object (file).
#
# ==Examples:
# - Getting the etag of a multipart-uploaded file (with a 5MB chunk size):
# puts get_etag('/tmp/my_file.tar.gz', 5*1024*1024)
#
# - Getting the etag of a simple-uploaded file:
# puts get_etag('/tmp/my_file.tar.gz')
#
# Author:: Joseph Herlant ([email protected])
# Copyright:: Copyright (c) 2014 Joseph Herlant
# License:: Distributed under the terms of the Apache 2 license
#
def get_etag(file_name, chunk_size=0)
md5 = Digest::MD5.new
if chunk_size == 0 then
md5.file(file_name).hexdigest
else
chunk_slices = 0
File.open(file_name, "rb") do |f|
until f.eof? do
md5 << Digest::MD5.digest(f.read(chunk_size))
chunk_slices += 1
end
end
"#{md5.hexdigest}-#{chunk_slices}"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment