Skip to content

Instantly share code, notes, and snippets.

@gaphex
Created May 9, 2019 16:08
Show Gist options
  • Save gaphex/e62aa5c3c6e054af5117be5d0b7b5946 to your computer and use it in GitHub Desktop.
Save gaphex/e62aa5c3c6e054af5117be5d0b7b5946 to your computer and use it in GitHub Desktop.
XARGS_CMD = ("ls ./shards/ | "
"xargs -n 1 -P {} -I{} "
"python3 bert/create_pretraining_data.py "
"--input_file=./shards/{} "
"--output_file={}/{}.tfrecord "
"--vocab_file={} "
"--do_lower_case={} "
"--max_predictions_per_seq={} "
"--max_seq_length={} "
"--masked_lm_prob={} "
"--random_seed=34 "
"--dupe_factor=5")
XARGS_CMD = XARGS_CMD.format(PROCESSES, '{}', '{}', PRETRAINING_DIR, '{}',
VOC_FNAME, DO_LOWER_CASE,
MAX_PREDICTIONS, MAX_SEQ_LENGTH, MASKED_LM_PROB)
tf.gfile.MkDir(PRETRAINING_DIR)
!$XARGS_CMD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment