Follow these steps to remove all archives from an AWS vault. After this is finished, you will be able to delete the vault itself through the browser console.
This will create a job that collects required information about the vault.
$ aws glacier initiate-job --job-parameters '{"Type": "inventory-retrieval"}' --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME
This can take hours or even days, depending on the size of the vault. Use the following command to check if it is ready:
aws glacier list-jobs --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME
Copy the JobId (including the quotes) for the next step.
The following command will result in a file listing all archive IDs, required for step 3.
$ aws glacier get-job-output --account-id YOUR_ACCOUNT_ID --region YOUR_REGION --vault-name YOUR_VAULT_NAME --job-id YOUR_JOB_ID ./output.json
Set the following parameters through environment variables:
export AWS_ACCOUNT_ID=YOUR_ACCOUNT_ID
export AWS_REGION=YOUR_REGION
export AWS_VAULT_NAME=cvast-YOUR_VAULT_NAME
Create a file with the following content and run it:
#!/bin/bash
file='./output.json'
if [[ -z ${AWS_ACCOUNT_ID} ]] || [[ -z ${AWS_REGION} ]] || [[ -z ${AWS_VAULT_NAME} ]]; then
echo "Please set the following environment variables: "
echo "AWS_ACCOUNT_ID"
echo "AWS_REGION"
echo "AWS_VAULT_NAME"
exit 1
fi
archive_ids=$(jq .ArchiveList[].ArchiveId < $file)
for archive_id in ${archive_ids}; do
echo "Deleting Archive: ${archive_id}"
aws glacier delete-archive --archive-id=${archive_id} --vault-name ${AWS_VAULT_NAME} --account-id ${AWS_ACCOUNT_ID} --region ${AWS_REGION}
done
echo "Finished deleting archives"
This tutorial is based on this one: https://gist.github.com/Remiii/507f500b5c4e801e4ddc
Unfortunately, the
awsCLI client is inefficient because of the overhead of client creation.After experimenting with the
awsCLI client and the completely awesome GNU Parallel, I switched to Python.Yes, yes, I could refactor this in Go and deploy it to a fleet of Kubernetes-orchestrated services but sometimes a hack is just enough.
In CLI examples, I tend to follow the O'Reilly Style Guide in case you're unsure about
\to break lines and a leading>for$PS2.I'll have to assume you know your way around the *nix command line and the vagaries of AWS, Python,
pipand module installation. If you don't, stop here and RTFM before you shoot yourself and your colleagues in the foot, face and backside.max_concurrent_requests = 2because I uset3.nanoworker instances to delete archivesmax_concurrent_requestsensures thes3 cpsucceeds - YMMV~/.aws/configrather than set client config in the scriptHere's an example means to create a new
~/.aws/config:If you want to better understand
max_attemptsandretry_mode, then the AWS documentation is reasonable. I needed to change the default behaviour for reasons too. You may decide this is not needed, but I am managing hundreds of vaults - each with millions of archives and some interesting retention requirements.You'll need to use
pipto installboto3andjq(as above) to stream the JSON inventory blob to the script - it reads the archive IDs from STDIN.The script accepts two arguments:
You're responsible for your own AWS MFA, auth, role, keys, etc.
Anyhow, try this to call the script:
The log's date string is generated from
%sbecause logging things per epoch second allows you to do trivial mathematics if you want to calculate runtime (the difference between the firstdeleted archivemessage and the lastdeleted archivemessage) and so on.As any fule kno,
%sis seconds since 1970-01-01 00:00:00 UTC.If you're really bored, you can create a histogram showing deletions per seconds - this helps visualise API behaviour.
Here's the script:
Hopefully: