Skip to content

Instantly share code, notes, and snippets.

@sshleifer
Last active June 21, 2020 19:06
Show Gist options
  • Save sshleifer/2e5c15d3ee428c0b7fed6407e3d9b419 to your computer and use it in GitHub Desktop.
Save sshleifer/2e5c15d3ee428c0b7fed6407e3d9b419 to your computer and use it in GitHub Desktop.
Broken Script to translate with cc25
export langs=ar_AR,cs_CZ,de_DE,en_XX,es_XX,et_EE,fi_FI,fr_XX,gu_IN,hi_IN,it_IT,ja_XX,kk_KZ,ko_KR,lt_LT,lv_LV,my_MM,ne_NP,nl_XX,ro_RO,ru_RU,si_LK,tr_TR,vi_VN,zh_CN
export CC25=/Users/shleifer/cc25_pretrain
export outfile=pred_en_ro.txt
export PRETRAIN=$CC25/model.pt
fairseq-generate tmp/ --path $PRETRAIN \
--task translation_from_pretrained_bart -t en_XX -s ro_RO --bpe 'sentencepiece' \
--sentencepiece-vocab $CC25/sentence.bpe.model --sacrebleu --remove-bpe 'sentencepiece' \
--max-sentences 32 --langs $langs --beam 5 > $outfile
grep ^H $outfile | cut -f3- > tgt.txt
grep ^T $outfile | cut -f2- > ref.txt
@sshleifer
Copy link
Author

tmp/ contains

dict.ro_RO.txt
dict.en_XX.txt
data.spm.ro_RO
data.spm.en_XX

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment