Skip to content

Instantly share code, notes, and snippets.

@prog893
Last active October 22, 2024 11:57
Show Gist options
  • Save prog893/2b6bca07e30d146f3dc4735f7bcab47a to your computer and use it in GitHub Desktop.
Save prog893/2b6bca07e30d146f3dc4735f7bcab47a to your computer and use it in GitHub Desktop.
S3 versioned bucket empty tool

S3 versioned bucket empty tool

For those times when you try to delete a versioned S3 bucket on AWS that has millions of objects/versions in it. (Should also work on non-versioned buckets as well)

Usage

The following options are available:

  • --bucket (required): name of the bucket to delete
  • --profile: profile name in ~/.aws/config to use when sending requests to AWS API (e.g. foo for [profile foo] in ~/.aws/config file). Default is default.
  • --batch: size of batch for sending delete requests. Not more than 1000 requests per batch is recommended by boto3. When specified, enables batch procession (turned off by default).
  • --delete: delete target bucket if specified. Default: do not delete bucket.
  • --prefix: Path prefix to use when deleting objects. Useful when you only want to purge a single directory or run this tool in parallel (be careful not to hit rate limits).

Example

$ s3-empty-bucket --profile brooklyn --bucket brooklyn-backup --batch 1000 --prefix some/prefix

Requirements

  • python3
  • latest boto3 installed (pip3 install boto3)

Why

Doing something like aws s3 rm or even aws s3 sync to an empty directory on versioned S3 bucket results in a delete marker being placed on objects. Thus, on AWS console object only looks like it is deleted, however previous versions are still there internally. These versions prevent bucket deletion.

Deleting objects from console does completely remove objects, but it is not practical to delete millions of objects from web console. The recommended, better and faster to empty your buckets using bucket lifecycle object/version expiration.

But, I wanted to track deletion progress, and lifecycles currently do not provide any status reports. Run and pray for it to finish. This is why I made this tool, that lists all versions of all objects and deletes them form the API.

Disclaimer

Most of the boto3 errors are not handled. Use at your own risk. Any comments/suggestions are welcome!

License

MIT License

Copyright (c) 2018 Torgayev Tamirlan

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

#!/usr/local/bin/python3
import boto3
from boto3.session import Session
import argparse
import logging
def delete_objects(bucket_struct, batch):
print("Trying to delete batch of {} objects".format(len(versions_batch)))
bucket_struct.delete_objects(
Delete={
'Objects': batch
}
)
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Empty S3 bucket to prepare for deletion')
parser.add_argument('--bucket', type=str, help="Bucket to empty")
parser.add_argument('--profile', type=str, help="AWS profile to use")
parser.add_argument('--batch', type=int, help="Batch size")
parser.add_argument('--delete', action="store_true", help="Delete emptied bucket")
parser.add_argument('--prefix', type=str, help="Path prefix to purge")
args = parser.parse_args()
if not args.bucket:
print("No bucket specified.")
exit(-1)
if args.profile:
print("Using profile " + args.profile)
session = Session(profile_name=args.profile)
s3 = session.resource('s3')
else:
boto3.set_stream_logger('boto3.resources', logging.INFO)
s3 = boto3.resource('s3')
bucket = s3.Bucket(args.bucket)
if args.prefix:
object_versions = bucket.object_versions.filter(Prefix=args.prefix).all()
else:
object_versions = bucket.object_versions.all()
if args.batch:
versions_batch = []
for version in object_versions:
print("Adding to batch: s3://{}/{} version ID: {}".format(version.bucket_name, version.object_key,
version.id))
versions_batch.append({
'Key': version.object_key,
'VersionId': version.id
})
if len(versions_batch) == args.batch:
delete_objects(bucket, versions_batch)
versions_batch = []
if len(versions_batch) > 0:
delete_objects(bucket, versions_batch)
versions_batch = []
else:
for version in object_versions:
print("Deleting: s3://{}/{} version ID: {}".format(version.bucket_name, version.object_key, version.id))
version.delete()
if args.delete:
bucket.delete()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment