Skip to content

Instantly share code, notes, and snippets.

@gleicon
Last active October 11, 2023 14:27
Show Gist options
  • Save gleicon/2b8acb9f9c0f22753eaac227ff997b34 to your computer and use it in GitHub Desktop.
Save gleicon/2b8acb9f9c0f22753eaac227ff997b34 to your computer and use it in GitHub Desktop.
How to use boto3 with google cloud storage and python to emulate s3 access.
from boto3.session import Session
from botocore.client import Config
from botocore.handlers import set_list_objects_encoding_type_url
import boto3
ACCESS_KEY = "xx"
SECRET_KEY = "yy"
boto3.set_stream_logger('')
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
region_name="US-CENTRAL1")
session.events.unregister('before-parameter-build.s3.ListObjects',
set_list_objects_encoding_type_url)
s3 = session.resource('s3', endpoint_url='https://storage.googleapis.com',
config=Config(signature_version='s3v4'))
bucket = s3.Bucket('yourbucket')
for f in bucket.objects.all():
print(f.key)
@gleicon
Copy link
Author

gleicon commented Dec 18, 2017

Google Cloud implements the latest s3 protocol. Most errors while trying to make it work were like the following:

botocore.exceptions.ClientError: An error occurred (InvalidArgument) when calling the ListObjects operation: Invalid argument.

Which meant (in this order): Forgot to set region, mistyped google's region, have not set the proper protocol version and having boto append "encoding=url" as query string which Google Storage won't accept. The last one was tricky to unregister.

Setting boto logger helped track, as it helped reading the header dump (sometimes it would complains about an amz-sha256 header). Thanks to all github issues and pieces around I managed to make it work. Put it all together in this gist so other people can get over this and get work done.

@ruurtjan
Copy link

Thanks for sharing! This also works from AWS lambda's, where boto3 is pre-installed :)

Took me a while to figure out what access key and secret to use, but here's what you need to do to get them: https://cloud.google.com/storage/docs/migrating#keys

@FISMAL
Copy link

FISMAL commented Jan 17, 2018

@gleicon have you got the working version of this? I am getting Invalid argument all the time.

@FISMAL
Copy link

FISMAL commented Jan 17, 2018

Never mind it works now.

@gleicon
Copy link
Author

gleicon commented Feb 19, 2018

Sorry folks, github doesn't makes it easy to track gist comments. This is working to me to move an AWS project to GCP. I should have mentioned that you need to create the storage instance in "interoperability mode" to get the AWS like credentials. I had to score the docs and boto code to figure that out, so I'm glad that helped.

@pgillet
Copy link

pgillet commented Feb 22, 2018

Thank you, that helped a lot.
As for the generation of pre-signed URLs, just replace the 'AWSAccessKeyId' query param in the generated URL by 'GoogleAccessId' to make it work.
Neither boto3 nor botocore mention the literal 'GoogleAccessId' in their code, so you have to replace it by hand as follows:

url = s3.meta.client.generate_presigned_url(
    ClientMethod='get_object',
    Params={
        'Bucket': 'yourbucket',
        'Key': 'object.txt'
    }
)

url = url.replace('AWSAccessKeyId', 'GoogleAccessId')

@mcint
Copy link

mcint commented Jan 15, 2019

Thanks for a minimum viable solution to the gcp interop issue. (Linking because I didn't understand your solution until reading the issue thread).

@killua8p
Copy link

If you are wondering how to generate the ACCESS KEY and SECRET KEY: https://cloud.google.com/storage/docs/authentication/managing-hmackeys

@stoyanK7
Copy link

stoyanK7 commented Oct 10, 2023

Has anyone managed to get this working with a ResponseContentDisposition header? I'm receiving SignatureDoesNotMatch errors

Edit: Check my comment under(https://gist.github.com/gleicon/2b8acb9f9c0f22753eaac227ff997b34?permalink_comment_id=4721658#gistcomment-4721658) for solution

@gleicon
Copy link
Author

gleicon commented Oct 11, 2023

Try enabling boto3 logging. I'm not sure Google Object Storage support all headers or the same syntax (see the AwsAccessKeyId issue above). Also check if the signature version still right for the type of storage you've created.

@stoyanK7
Copy link

I managed to get it working. The comment under https://stackoverflow.com/a/21028609/9553927 helped:

The signed urls worked for me. Although, I've tried to call generate_url() with the parameter response_headers and the value response-content-disposition but I got malformed signed urls. So my solution has been to concatenate '&response-content-disposition=attachment%3B%20filename%3D"{}"'.format(file_name) to the signed url and it worked.

params = {
    "Bucket": "xyz",
    "Key": blob_name,
}
ten_minutes = 600  # seconds
url = self.s3_client.generate_presigned_url(
    "get_object", Params=params, ExpiresIn=ten_minutes
).replace("AWSAccessKeyId", "GoogleAccessId")
url += '&response-content-disposition=attachment;filename="newFileName"'

@gleicon
Copy link
Author

gleicon commented Oct 11, 2023

Awesome, thanks for sharing !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment