Last active
May 19, 2021 00:43
-
-
Save joshuadfranklin/5130355 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/python | |
import boto | |
import math | |
# Use boto to Copy an Object greater than 5 GB Using S3 Multipart Upload API | |
# probably could be made more pythonesque, based directly off the AWS Java example | |
# Copy an Object [greater than 5 GB] Using the AWS SDK for Java [S3] Multipart Upload API | |
# http://docs.aws.amazon.com/AmazonS3/latest/dev/CopyingObjctsUsingLLJavaMPUapi.html | |
# copy in same bucket as a simple test | |
bucket_name = 'btest1234' | |
source_bucket = bucket_name | |
destination_bucket = bucket_name | |
orig_key_name = 'foo.gz' | |
dest_key_name = 'copy' + orig_key_name | |
s3 = boto.connect_s3(debug=1) | |
sb = s3.get_bucket(source_bucket) | |
ky = sb.lookup(orig_key_name) | |
objectSize = ky.size | |
print "found objectSize of %d" % objectSize | |
b = s3.get_bucket(destination_bucket) | |
mp = b.initiate_multipart_upload(dest_key_name, reduced_redundancy=True) | |
psize = 50 * math.pow(2.0, 20.0) # 2^20 = 1 MiB | |
bytePosition = 0 | |
i = 1 | |
while bytePosition < objectSize: | |
lastbyte = bytePosition + psize -1 | |
if lastbyte > objectSize: | |
lastbyte = objectSize - 1 | |
print "mp.copy_part_from_key part %d (%d %d)" % (i,bytePosition,lastbyte) | |
mp.copy_part_from_key(source_bucket, orig_key_name, i, int(bytePosition),int(lastbyte)) | |
i = i+1 | |
bytePosition += psize | |
mp.complete_upload() | |
print "done" |
get_key
is now prefered over lookup
. I'm given to understand that get_key
uses a HEAD which is a bit cheaper and faster.
Old post, I know, but AWS supports simple copy (S3.Client.copy) with built-in multipart when necessary. See http://boto3.readthedocs.io/en/latest/reference/services/s3.html#S3.Client.copy
Thanks for your original share here!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I found this useful, but shouldn't line 32 test for ">=" instead of just ">" ?
For psize = 50 and objectSize= 99, the current logic would transfer bytes 0-49 as the first part, and bytes 50-99 as the second part; but there is no byte 99.
I haven't read th ecode for copy_part_from_key; I expect it defends against this case.