Skip to content

Instantly share code, notes, and snippets.

@noisychannel
Created April 23, 2015 21:58
Show Gist options
  • Save noisychannel/302c2f58f57b92290f2f to your computer and use it in GitHub Desktop.
Save noisychannel/302c2f58f57b92290f2f to your computer and use it in GitHub Desktop.
MOSES : Build phrase table
#!/usr/bin/env bash
# Change these variables
ROOT_DIR=/export/a04/gkumar/experiments/scale-2015/1
EXTERNAL_BIN_DIR=/export/a04/gkumar/code/mosesdecoder/tools
F_EXT=pa
E_EXT=en
MAX_PHRASE_LENGTH=10
CORPUS=/export/a04/gkumar/experiments/scale-2015/data/trans
# Prepare corpus
${MOSES}/scripts/training/train-model.perl \
-root-dir ${ROOT_DIR} \
-first-step 1 -last-step 1 -external-bin-dir ${EXTERNAL_BIN_DIR} \
-f ${F_EXT} -e ${E_EXT} -alignment grow-diag-final-and \
-max-phrase-length ${MAX_PHRASE_LENGTH} -reordering msd-bidirectional-fe -score-options '--GoodTuring' \
-corpus ${CORPUS} \
-corpus-dir ${ROOT_DIR}/training/prepared.1
# Run alignment in both directions
${MOSES}/scripts/training/train-model.perl \
-root-dir ${ROOT_DIR} \
-first-step 2 -last-step 2 -external-bin-dir ${EXTERNAL_BIN_DIR} \
-f ${F_EXT} -e ${E_EXT} -alignment grow-diag-final-and \
-max-phrase-length ${MAX_PHRASE_LENGTH} -reordering msd-bidirectional-fe -score-options '--GoodTuring' \
-corpus ${CORPUS} \
-corpus-dir ${ROOT_DIR}/training/prepared.1 \
-giza-e2f ${ROOT_DIR}/training/giza.1 -direction 2
${MOSES}/scripts/training/train-model.perl \
-root-dir ${ROOT_DIR} \
-first-step 2 -last-step 2 -external-bin-dir ${EXTERNAL_BIN_DIR} \
-f ${F_EXT} -e ${E_EXT} -alignment grow-diag-final-and \
-max-phrase-length ${MAX_PHRASE_LENGTH} -reordering msd-bidirectional-fe -score-options '--GoodTuring' \
-corpus ${CORPUS} \
-corpus-dir ${ROOT_DIR}/training/prepared.1 \
-giza-f2e ${ROOT_DIR}/training/giza.inverse.1 -direction 1
# Run the remaining steps
${MOSES}/scripts/training/train-model.perl \
-root-dir ${ROOT_DIR} \
-first-step 3 -last-step 6 -external-bin-dir ${EXTERNAL_BIN_DIR} \
-f ${F_EXT} -e ${E_EXT} -alignment grow-diag-final-and \
-max-phrase-length ${MAX_PHRASE_LENGTH} -reordering msd-bidirectional-fe -score-options '--GoodTuring' \
-corpus ${CORPUS} \
-corpus-dir ${ROOT_DIR}/training/prepared.1 \
-giza-e2f ${ROOT_DIR}/training/giza.1 \
-giza-f2e ${ROOT_DIR}/training/giza.inverse.1 \
-alignment-file ${ROOT_DIR}/model/aligned.1 \
-alignment-stem ${ROOT_DIR}/model/aligned.1 \
-lexical-file ${ROOT_DIR}/model/lex.1 \
-extract-file ${ROOT_DIR}/model/extract.1 \
-phrase-translation-table ${ROOT_DIR}/model/phrase-table.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment