Skip to content

Instantly share code, notes, and snippets.

@wwwslinger
Last active August 29, 2015 14:09
Show Gist options
  • Save wwwslinger/61cfc3f6864e7193afbb to your computer and use it in GitHub Desktop.
Save wwwslinger/61cfc3f6864e7193afbb to your computer and use it in GitHub Desktop.
Parallelize a large MySQL import (in part from https://gist.github.com/wardbekker/964146)
# Large (multi-GB), gzipped mysqldump file: dump.sql.gz
# Split into a file per table, but appending any header optimizations with SET and USE statements
zcat dump.sql.gz | awk '/DROP TABLE IF EXISTS/{n++}{print >"out" n " .sql" }'
# Creates n output files, for example 34, but the first "out.sql" with no number
# has all the header SET and USE statements. We need to prepend out.sql to every
# other outn.sql. We have to do this with a temporary outn.txt file and GNU Parallel
# (http://www.gnu.org/software/parallel/):
mv out.sql out.txt
seq 1 33 | parallel "cat out.txt out{}.sql >> out{}.txt && mv out{}.txt out{}.sql"
# Now load them all in in parallel. Replace "USER", "PASSWORD", "DATABASE" with
# the real values
ls -rS *.sql | parallel --joblog dump_import.log mysql -u USER -pPASSWORD DATABASE "<"
@ole-tange
Copy link

Avoid doubling temporary files with

zcat dump.sql.gz | awk '/DROP TABLE IF EXISTS/{n++}{print >"out" n " .sql" }'
parallel --joblog dump_import.log "cat out.sql {} | sql mysql://user:pass@/db" ::: out[0-9]*.sql

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment