Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save kumar-de/ec55b71fa948da4c99cd0ca51cc20cbf to your computer and use it in GitHub Desktop.
Save kumar-de/ec55b71fa948da4c99cd0ca51cc20cbf to your computer and use it in GitHub Desktop.
Create external Impala table partitioned by a certain column and recover old partitions of the same table
# The following example is for a job that produces daily, parquets partitioned by 'day' column
DB="database"
BASE_DIR="/output/parquets"
TBL="${DB}.table" # name of the impala table
PQT="${BASE_DIR}/day" # name of the parent directory containing output-subdirectories (in the format 'day=ddMMyyy')
# create/mount impala table
impala-shell -q "drop table if exists $TBL;
create external table $TBL (
account_id string,
vehicle_id string,
start_time string)
partitioned by (day int)
stored as parquet
location '$PQT';" ${impala_args}
impala-shell -q "ALTER TABLE $TBL RECOVER PARTITIONS;" ${impala_args}
impala-shell -q "Invalidate Metadata $TBL;" ${impala_args}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment