Skip to content

Instantly share code, notes, and snippets.

@SnowMasaya
Last active August 29, 2015 14:19
Show Gist options
  • Save SnowMasaya/e30ccd5057b3887854d1 to your computer and use it in GitHub Desktop.
Save SnowMasaya/e30ccd5057b3887854d1 to your computer and use it in GitHub Desktop.
Kaldiに関する処理を日本語のドキュメントでまとめてみた(データ準備編)2 ref: http://qiita.com/GushiSnow/items/a24cad7231de341738ee
#silのみ出力
utils/make_lexicon_fst_silprob.pl $tmpdir/lexiconp_silprob_disambig.txt $s rcdir/silprob.txt $silphone '#'$ndisambig | \
#置き換え処理
sed 's=\#[0-9][0-9]*=<eps>=g' | \for indirect one, use twice the learning rate
#音素を入力、単語を出力として重み付き状態変換器の作成
fstcompile --isymbols=$dir/phones.txt --osymbols=$dir/words.txt \
--keep_isymbols=false --keep_osymbols=false | \
#14:重み付き状態変換器をソート:下記に例を示す
fstarcsort --sort_type=olabel > $dir/L.fst || exit 1
fstprint --isymbols=./data/lang/phones.txt(音素ファイル) --osymbols=./data/lang/words.txt(単語ファイル) ../../../test_japanese/data/lang_test_tg/L.fst(fstファイル) test.txt(出力されるファイル)
dot -Tjpg test.dot > test.jpg
xli test.jpg
cat $lmdir/lm.arpa | \
grep -v '<s> <s>' | \
grep -v '</s> <s>' | \
grep -v '</s> </s>' | \
arpa2fst - | fstprint | \
utils/remove_oovs.pl $tmpdir/oovs.txt | \
utils/eps2disambig.pl | utils/s2eps.pl | fstcompile --isymbols=$test/words.txt \
--osymbols=$test/words.txt --keep_isymbols=false --keep_osymbols=false | \
fstrmepsilon | fstarcsort --sort_type=ilabel > $test/G.fst
fstisstochastic $test/G.fst
awk '{if(NF==1){ printf("0 0 %s %s\n", $1,$1); }} END{print "0 0 #0 #0"; print "0";}' \
< "$lexicon" >$tmpdir/g/select_empty.fst.txt
fstcompile --isymbols=$test/words.txt --osymbols=$test/words.txt \
$tmpdir/g/select_empty.fst.txt | \
fstarcsort --sort_type=olabel | fstcompose - $test/G.fst > $tmpdir/g/empty_words.fst
fstinfo $tmpdir/g/empty_words.fst | grep cyclic | grep -w 'y' &&
echo "Language model has cycles with empty words" && exit 1
rm -rf $tmpdir
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment