Skip to content

Instantly share code, notes, and snippets.

@imperialwicket
Last active December 15, 2015 00:19
Show Gist options
  • Save imperialwicket/5172544 to your computer and use it in GitHub Desktop.
Save imperialwicket/5172544 to your computer and use it in GitHub Desktop.
Bash script for syncing portions of s3 bucket based on dated directory structures.
#!/bin/bash
##################################################
#
# s3-s3cmd-sync-dated-dirs.sh
#
# Easily download content in dated directory
# structures from s3.
#
#
# https://gist.github.com/imperialWicket
#
# version 1.0.0
#
# http://mit-license.org/
#
###################################################
usage()
{
cat << EOF
This script breaks down large s3cmd syncing efforts into smaller sizes
based on presence of date values in the path.
USAGE:
./s3-s3cmd-sync-dated-dirs.sh [-b bucket][-c][-d local_dir][-s start_date]
[-f finish_date][-m date_format][-p prefix][-e excludes_file][-l log_file]
OPTIONS:
-h Show this usage information.
-b Bucket name - required information, this script will attempt to read
this interactively if it is not provided.
-d Local storage dir (defaults to pwd)
-s Start date (YYYY-mm-dd)
-f Finish date (YYYY-mm-dd)
-m Date format (limit to month/day or / - . separators, etc.)
default is '%Y/%m/%d'; use `man date` formats.
-p Non-date prefix
-e Excludes file (s3cmd format)
-i Includes file (s3cmd format)
-c Check-md5 (default uses s3cmd --no-check-md5)
-l Log file
EXAMPLES:
./s3-s3cmd-sync-dated-dirs.sh \
-b my-bucket \
-p sub-folder \
-m '%Y%m%d' \
-s 2010-01-01 \
-f 2011-07-01
Sync everything in s3://my-bucket/sub-folder/20100101/ through
s3://my-bucket/sub-folder/20110701 to ./sub-folder/%Y%m%d/[key],
do not output or log anything.
./s3-s3cmd-sync-dated-dirs.sh \
-b my-bucket \
-s 2011-01-01 \
-e excludes_file \
-d /home/user \
-l /home/user/log.log \
-c
Sync everything since 2011-01-01 in s3://my-bucket/2011/01/01/
to /home/user/2011/01/01/[key] except patterns present in the
excludes file, use check-md5 for file validation. Log everything in
/home/user/log.log.
EOF
}
LOCAL_DIR='./'
START='2006-03-14'
CHECKSUM="--no-check-md5"
DATE_FORMAT="%Y/%m/%d"
while getopts ":hb:d:s:f:p:m:e:cl:" flag
do
case "$flag" in
h)
usage
exit 0
;;
b)
BUCKET=$OPTARG
;;
d)
LOCAL_DIR="$OPTARG/"
;;
s)
START=$OPTARG
;;
f)
FINISH=$OPTARG
;;
p)
PREFIX="$OPTARG/"
;;
e)
EXCLUDES="--exclude-from $OPTARG"
;;
i)
INCLUDES="--include-from $OPTARG"
;;
l)
LOG=$OPTARG
;;
m)
DATE_FORMAT="$OPTARG"
;;
c)
CHECKSUM=""
;;
?)
usage
exit 1
;;
esac
done
if [ -z $BUCKET ]; then
read -p "Please provide a bucket [Enter to exit]: " BUCKET
if [ -z $BUCKET ]; then
echo "Exiting, no bucket provided."
exit 1
fi
fi
BEGIN=`echo $(( ($(date +%s)-$(date -d "$START" +%s))/(24*60*60) ))`
if [ -z $FINISH ]; then
END=0
else
END=`echo $(( ($(date +%s)-$(date -d "$FINISH" +%s))/(24*60*60) ))`
fi
for n in `seq $END $BEGIN | tac`; do
DATE_DIR=`date --date="$n days ago" +$DATE_FORMAT`
if [[ "$LAST_DATE_DIR" == "$DATE_DIR" ]]; then
continue
else
LAST_DATE_DIR=$DATE_DIR
fi
S3_PATH="${PREFIX}${DATE_DIR}/"
LOCAL_PATH="${LOCAL_DIR}${PREFIX}${DATE_DIR}/"
mkdir -p $LOCAL_PATH
if [ -n $LOG ]; then
touch $LOG > /dev/null 2>&1
fi
if [ -w $LOG ]; then
echo "`date`: $BUCKET/$S3_PATH" >> $LOG
s3cmd sync $CHECKSUM $EXCLUDES $INCLUDES \
s3://$BUCKET/$S3_PATH \
$LOCAL_PATH \
>> $LOG 2>&1
elif [ -n $LOG ]; then
echo "Exiting, log file $LOG is not writable."
exit 1
else
s3cmd sync $CHECKSUM $EXCLUDES $INCLUDES\
s3://$BUCKET/$S3_PATH \
$LOCAL_PATH \
> /dev/null 2>&1
fi
done
exit 0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment