Created
September 22, 2020 07:25
-
-
Save zoltanctoth/623b28df865e295f86404c9046c52854 to your computer and use it in GitHub Desktop.
delete thousands or millions of objects in S3
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Hint: If you are stuck by having tens of millions of files under an S3 Prefix, perhaps | |
# the easiest is to set the prefix's Expiration to one day in the Lifecycle Management | |
# pane of the bucket in the Web UI and Amazon will take care of the object deletion for you | |
# A good resource where I've gotten the scripts is this: | |
https://serverfault.com/questions/679989/most-efficient-way-to-batch-delete-s3-files#comment1200074_917740 | |
# List all objects | |
aws s3api list-objects --output text --bucket <<BUCKET_NAME>> --query 'Contents[].[Key]' --prefix <<prefix, like tmp/sandbox>> | pv -l | |
# DELETE OBJECTS (you can start executing this right after you started listing the objects into `to-delete.keys`) | |
tail -n+0 to-delete.keys | pv -l | grep -v -e "'" | tr '\n' '\0' | xargs -0 -P1 -n1000 bash -c 'aws s3api delete-objects --bucket <<BUCKET_NAME>> --delete "Objects=[$(printf "{Key=%q}," "$@")],Quiet=true"' | |
# Even though you are deleting 1000 objects with every HTTP request, this can take long-long hours if we are talking hundreds of millions of records | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have found that this use of xargs incorrectly skips the first element in the keys file for every iteration.
Entries in file not included in the Key list are - lines 1, 1001, 2001, etc.
The solution is to put an underscore at the end of the line "$@")],Quiet=true"' _