Skip to content

Instantly share code, notes, and snippets.

@belenaj
Last active September 9, 2020 14:53
Show Gist options
  • Save belenaj/b6f555e2cf0ce5380eee45efb78a3e14 to your computer and use it in GitHub Desktop.
Save belenaj/b6f555e2cf0ce5380eee45efb78a3e14 to your computer and use it in GitHub Desktop.
[Recreate partitions] #s3 #bash
ALTER TABLE schema_name.table_name
ADD IF NOT EXISTS PARTITION (YEAR=${hiveconf:year}, MONTH=${hiveconf:month}, DAY=${hiveconf:day})
LOCATION 's3://mybucket/some/prefix/${hiveconf:year}/${hiveconf:month}/${hiveconf:day}/';
year=$1
month=$2
day=$3
hive -v -f add_partition.sql --hiveconf year=$year --hiveconf month=$month --hiveconf day=$day;
############################################################
# 1. list all files in the bucket that fits the prefix
# 2. removes the prefix from the object name
# 3. drops the filename (last column in awk)
# 4. get unique folders (more than one file in a folder)
# 5. extracts the partition values (depends on the table what tokens to choose)
# 6. sorts partitions from older to newer (year/month/day in this case)
# 7. calls a hive script for each partition passing the partition values as parameters
############################################################
prefix="some/prefix/"
bucket_name="mybucket"
aws s3api list-objects \
--bucket $bucketname \
--output text \
--prefix $prefix \
--query "Contents[?contains(Key, '.')].{Key : Key}" \
| sed -e "s|$prefix||g" \
| awk -F/ '{$NF=""; print $0}' \
| awk '!a[$0]++' \
| awk '{ print $2,$3,$4 }' \
| sort -nk123 \
| xargs -l sh generate_partitions.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment