Last active
March 22, 2024 07:30
-
-
Save multidis/18971bfe0b786f0beee2 to your computer and use it in GitHub Desktop.
List of files in a specific AWS S3 location in a shell script.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# setup AWS CLI first | |
# http://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-set-up.html | |
# configure AWS CLI (e.g. use IAM role for S3 access) | |
export AWS_DEFAULT_REGION=us-east-1 | |
export AWS_ACCESS_KEY_ID=IDHERE | |
export AWS_SECRET_ACCESS_KEY=KeyHere | |
# s3 ls command | |
# http://docs.aws.amazon.com/cli/latest/reference/s3/ls.html | |
# space-separated string (contains dates etc.) | |
flist=$(aws s3 ls s3://bucket.name/directory/path/) | |
# file list as an array | |
flist=(`aws s3 ls s3://bucket.name/directory/path/ | awk '{print $4}'`) | |
# first element | |
echo $flist | |
# NOTE: indexing starts with 0 | |
echo ${flist[0]} | |
# all elements | |
# http://stackoverflow.com/questions/15224535/bash-put-list-files-into-a-variable-and-but-size-of-array-is-1 | |
echo ${flist[@]} |
If you want to iterate over files, try the following:
flist=`aws s3 ls s3://bucket.name/directory/path/ | awk '{print $4}'`
for i in $flist
do
{
# perform what you want for each file "$i"
# for example to copy a file and tar it, you could do the following
aws s3 cp s3://bucket.name/directory/path/"$i" /local/path/
tar cvf /local/path/"${i%.*}".tar /local/path/"$i"
# copy back to new s3 bucket
aws s3 cp /local/path/"${i%.*}".tar s3://bucket.name/directory/new_path/
# remove local files
rm /local/path/"$i"
rm /local/path/"${i%.*}".tar
}
done
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
i need your help.
i have files like this
/bucketname/subjectarea1/yyyymm01
/bucketname/subjectarea1/yyyymm02
/bucketname/subjectarea1/yyyymm03
AND
/bucketname/subjectarea2/yyyymm01
/bucketname/subjectarea2/yyyymm02
/bucketname/subjectarea2/yyyymm03
i want to process 1 folder/subject/yyyymmdd at a time..
first i want putput like list the file with an indicator
/bucketname/subjectarea1/yyyymm01|N
/bucketname/subjectarea1/yyyymm02|N
/bucketname/subjectarea1/yyyymm03|N
after i process first file it will be
/bucketname/subjectarea1/yyyymm01|Y
/bucketname/subjectarea1/yyyymm02|N
/bucketname/subjectarea1/yyyymm03|N
I shouold get in sequence and should able to process.
Any available script.