Merlin Source Code Analysis
本文简单分析Merlin的一些源码。用于更深入的学习Merlin。
代码路径:https://github.com/CSTR-Edinburgh/merlin/blob/master/misc/scripts/frontend/utils/genScmFile.py
作用是对文本文件进行格式转换,转换成文本标准格式
。标准格式
由3
类文件组成:utt文件,scheme文件,scp文件。utt为空文件夹,供后续操作;scheme文件为文本和后续产生的utt文件之间的对应关系;scp文件为文件列表(无后缀)。
<in_txt_dir/in_txt_file> 为原始文本所在目录(每个文件以.txt结尾),或者原始文本
<out_utt_dir> 之后utt产生的路径
<out_scm_file> 生成的scm文件
<out_file_id_list> 生成的scp文件
常见 <out_utt_dir>空文件夹
生成文件名称为 scm文件,内容如下所示:
(utt.save (utt.synth (Utterance Text "Hello world." )) "D:\Python_Programs\Merlin_Toolkit\egs_database\utt\test_001.utt")
(utt.save (utt.synth (Utterance Text "Hi, this is a demo voice from Merlin." )) "D:\Python_Programs\Merlin_Toolkit\egs_database\utt\test_002.utt")
(utt.save (utt.synth (Utterance Text "Hope you guys enjoy free open-source voices from Merlin." )) "D:\Python_Programs\Merlin_Toolkit\egs_database\utt\test_003.utt")
(utt.save (utt.synth (Utterance Text "I love you China." )) "D:\Python_Programs\Merlin_Toolkit\egs_database\utt\test_004.utt")
(utt.save (utt.synth (Utterance Text "Are you OK?" )) "D:\Python_Programs\Merlin_Toolkit\egs_database\utt\test_005.utt")
(utt.save (utt.synth (Utterance Text "I am comming from China." )) "D:\Python_Programs\Merlin_Toolkit\egs_database\utt\test_006.utt")
test_001
test_002
test_003
test_004
test_005
test_006
<in_txt_dir/in_txt_file>不仅可以是文本路径,也可以是单个文件,其格式如下:
( arctic_a0001 "Author of the danger trail, Philip Steels, etc." )
( arctic_a0002 "Not at this particular case, Tom, apologized Whittemore." )
( arctic_a0003 "For the twentieth time that evening the two men shook hands." )
( arctic_a0004 "Lord, but I'm glad to see you again, Phil." )
festival -b <scheme_file>
作用 :调用festival对文本进行批量处理。<scheme_file>为前一步所产生。(no interaction)
结果 :生成utt文件。路径保存于<scheme_file>所指定的路径。
festival这一前端工具对文本进行了分析,例如:对文本Hello world.
操作后的结果为:
EST_File utterance
DataType ascii
version 2
EST_Header_End
Features max_id 44 ; type Text ; iform "\"Hello world.\"" ;
Stream_Items
1 id _1 ; name Hello ; whitespace "" ; prepunctuation "" ;
2 id _2 ; name world ; punc . ; whitespace " " ; prepunctuation "" ;
............此处省略n行............
End_of_Relation
Relation US_map ; ()
1 43 0 0 0 0
End_of_Relation
Relation Wave ; ()
1 44 0 0 0 0
End_of_Relation
End_of_Relations
End_of_Utterance
代码路径:https://github.com/CSTR-Edinburgh/merlin/blob/master/misc/scripts/frontend/festival_utt_to_lab/make_labels
从utt中提取单音素(monophone),以及full context labels
make_labels <labels_dir> <utts_dir> <dumpfeats> <scripts>
<labels_dir> ## 新产生的标签所在的文件路径
<utts_dir> ## utt文件所在路径
<dumpfeats> ## 指向Festival的dumpfeats脚本,安装好festival后应该知道,常见为:{FESTDIR}/examples/dumpfeats
<scripts> ## 下列脚本所在路径: extra_feats.scm label.feats label-full.awk label-mono.awk
在<labels_dir>文件夹中创建两个子目录,mono和full
对于<utts_dir>文件夹中的每个utt文件执行:
通过basename $utt .utt
获得basename
调用dumpfeats提取特征:
dumpfeats -eval $scripts/extra_feats.scm \
-relation Segment \
-feats $scripts/label.feats \
-output $labels/tmp \
$utt
分别写入mono和full文件夹:
gawk -f $scripts/label-full.awk $labels/tmp > $labels/full/$base.lab; \
gawk -f $scripts/label-mono.awk $labels/tmp > $labels/mono/$base.lab; \
清理临时产生的文件:rm -f tmp
dumpfeats为festival提供的工具,用于从utt中提取特征,详细如下:
Usage: dumpfeats [options] <utt_file_0> <utt_file_1> ...
Dump features from a set of utterances
Options
-relation <string>
Relation from which the features have to be dumped from
-output <string>
If output parameter contains a %s its treated as a skeleton
e.g feats/%s.feats and multiple files will be created one
each utterance. If output doesn't contain %s the output
is treated as a single file and all features and dumped in it.
-feats <string>
If argument starts with a "(" it is treated as a list of
features to dump, otherwise it is treated as a filename whose
contents contain a set of features (without parenetheses).
-eval <ifile>
A scheme file to be loaded before dumping. This may contain
dump specific features etc. If filename starts with a left
parenthis it it evaluated as lisp.
-from_file <ifile>
A file with a list of utterance files names (used when there
are a very large number of files.
gawk为比sed更强大的文本操作命令。-f
选项表示指定program
文件:
-f file Specifies a filename to read the program from
详细program可见$scripts/label-full.awk
和$scripts/label-mono.awk
。
我们以刚才通过文本Hello world.
产生的utt为例,展示经过make_labels
之后可以得到什么。
当前路径:/root/workspace/merlin_projects/step_by_step, 这个路径中所含文件结构如下:
root@de-3879-ng-2-123705-3173223045-0f7q9:~/workspace/merlin_projects/step_by_step# tree ./
|-- scm.scm
|-- test_001.utt
|-- test_002.utt
|-- test_003.utt
|-- test_004.utt
|-- test_005.utt
`-- test_006.utt
dumpfeats
dumpfeats=/root/workspace/Python_Programs/merlin/tools/festival/examples/dumpfeats
scripts=/root/workspace/Python_Programs/merlin/misc/scripts/frontend/festival_utt_to_lab
labels=.
utt=test_001.utt
$dumpfeats -eval $scripts /extra_feats.scm \
-relation Segment \
-feats $scripts /label.feats \
-output $labels /tmp \
$utt
执行完后,将产生一个tmp
新文件,内容如下:
0 pau hh 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 1 0 0 0 0 0 3 0 content 0 2 0 3 0 2 0 ax 0 0.22
pau hh ax 0 0 0 1 0 0 1 0 3 1 0 0 2 0 2 0 2 0 1 0 1 ax 0 content content 0 2 1 0 2 0 1 0 1 0 3 0 0 2 0 0 L-L% 3 2 1 0 0 0 0 0 3 0 content 0 2 0 3 0 2 0 l 0.22 0.27795401
hh ax l 1 0 0 1 0 0 1 0 3 1 0 0 2 0 2 0 2 0 1 0 1 ax 0 content content 0 2 1 0 2 0 1 0 1 0 3 0 0 2 0 0 L-L% 3 2 1 0 0 0 0 3 3 content content 2 2 3 3 2 2 pau ow 0.27795401 0.32017601
ax l ow 2 0 0 1 0 0 1 0 3 1 0 0 2 0 2 0 2 0 1 0 1 ax 0 content content 0 2 1 0 2 0 1 0 1 0 3 0 0 2 0 0 L-L% 3 2 1 0 1 0 1 3 1 content content 2 2 3 3 2 2 hh w 0.32017601 0.39965901
l ow w 0 0 1 1 0 1 1 3 1 4 1 1 1 0 1 0 1 0 1 0 1 ow 0 content content 0 2 1 0 2 0 1 0 1 0 3 0 0 2 0 0 L-L% 3 2 1 0 1 0 1 3 4 content content 2 1 3 3 2 2 ax er 0.39965901 0.55004603
ow w er 0 1 1 0 1 1 0 1 4 0 0 2 0 1 0 1 0 1 0 1 0 er content content 0 2 1 0 1 1 1 0 1 0 0 3 0 0 2 0 0 L-L% 3 2 1 1 1 1 1 1 4 content content 2 1 3 3 2 2 l l 0.55004603 0.62555099
w er l 1 1 1 0 1 1 0 1 4 0 0 2 0 1 0 1 0 1 0 1 0 er content content 0 2 1 0 1 1 1 0 1 0 0 3 0 0 2 0 0 L-L% 3 2 1 1 1 1 1 4 4 content content 1 1 3 3 2 2 ow d 0.62555099 0.72588098
er l d 2 1 1 0 1 1 0 1 4 0 0 2 0 1 0 1 0 1 0 1 0 er content content 0 2 1 0 1 1 1 0 1 0 0 3 0 0 2 0 0 L-L% 3 2 1 1 1 1 1 4 4 content content 1 1 3 3 2 2 w pau 0.72588098 0.81338102
l d pau 3 1 1 0 1 1 0 1 4 0 0 2 0 1 0 1 0 1 0 1 0 er content content 0 2 1 0 1 1 1 0 1 0 0 3 0 0 2 0 0 L-L% 3 2 1 1 0 1 0 4 0 content 0 1 0 3 0 2 0 er 0 0.81338102 0.88916397
d pau 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 2 1 1 0 1 0 4 0 content 0 1 0 3 0 2 0 l 0 0.88916397 1.33796
gawk
mkdir full
mkdir mono
base=test_001
gawk -f $scripts /label-full.awk $labels /tmp > $labels /full/$base .lab; \
gawk -f $scripts /label-mono.awk $labels /tmp > $labels /mono/$base .lab;
执行完后文件夹结构
|-- full
| `-- test_001.lab
|-- mono
| `-- test_001.lab
|-- scm.scm
|-- test_001.utt
|-- test_002.utt
|-- test_003.utt
|-- test_004.utt
|-- test_005.utt
|-- test_006.utt
`-- tmp
full/test_001.lab文件内容:
0 2200000 x^x-pau+hh=ax@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:0+0+3/D:0_0/E:x+x@x+x&x+x#x+x/F:content_2/G:0_0/H:x=x@1=1|0/I:3=2/J:3+2-1
2200000 2779540 x^pau-hh+ax=l@1_3/A:0_0_0/B:0-0-3@1-2&1-3#1-3$1-3!0-1;0-1|ax/C:1+1+1/D:0_0/E:content+2@1+2&1+1#0+1/F:content_1/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
2779540 3201760 pau^hh-ax+l=ow@2_2/A:0_0_0/B:0-0-3@1-2&1-3#1-3$1-3!0-1;0-1|ax/C:1+1+1/D:0_0/E:content+2@1+2&1+1#0+1/F:content_1/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
3201760 3996590 hh^ax-l+ow=w@3_1/A:0_0_0/B:0-0-3@1-2&1-3#1-3$1-3!0-1;0-1|ax/C:1+1+1/D:0_0/E:content+2@1+2&1+1#0+1/F:content_1/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
3996590 5500460 ax^l-ow+w=er@1_1/A:0_0_3/B:1-1-1@2-1&2-2#1-2$1-2!0-1;0-1|ow/C:1+1+4/D:0_0/E:content+2@1+2&1+1#0+1/F:content_1/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
5500460 6255510 l^ow-w+er=l@1_4/A:1_1_1/B:1-1-4@1-1&3-1#2-1$2-1!1-0;1-0|er/C:0+0+0/D:content_2/E:content+1@2+1&2+0#1+0/F:0_0/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
6255510 7258810 ow^w-er+l=d@2_3/A:1_1_1/B:1-1-4@1-1&3-1#2-1$2-1!1-0;1-0|er/C:0+0+0/D:content_2/E:content+1@2+1&2+0#1+0/F:0_0/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
7258810 8133810 w^er-l+d=pau@3_2/A:1_1_1/B:1-1-4@1-1&3-1#2-1$2-1!1-0;1-0|er/C:0+0+0/D:content_2/E:content+1@2+1&2+0#1+0/F:0_0/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
8133810 8891640 er^l-d+pau=x@4_1/A:1_1_1/B:1-1-4@1-1&3-1#2-1$2-1!1-0;1-0|er/C:0+0+0/D:content_2/E:content+1@2+1&2+0#1+0/F:0_0/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
8891640 13379600 l^d-pau+x=x@x_x/A:1_1_4/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:0+0+0/D:content_1/E:x+x@x+x&x+x#x+x/F:0_0/G:3_2/H:x=x@1=1|0/I:0=0/J:3+2-1
mono/test_001.lab文件内容:
0 2200000 pau
2200000 2779540 hh
2779540 3201760 ax
3201760 3996590 l
3996590 5500460 ow
5500460 6255510 w
6255510 7258810 er
7258810 8133810 l
8133810 8891640 d
8891640 13379600 pau
normalize_lab_for_merlin.py
路径:https://github.com/CSTR-Edinburgh/merlin/blob/master/misc/scripts/frontend/utils/normalize_lab_for_merlin.py
将上面步骤产生的mono和full lab进行归一化(normalization),以供merlin使用。
依据CSTR-Edinburgh/merlin#156 所言,这一代码主要做如下三件事:
Normalize duration to nearest divisible number by 5. Say 1.413 -> 1.415
Merge consecutive silence phones or pause phones to one.
Get rid of timestamps if required -- input format for HTK alignment
即:
将duration向最近邻靠近
对连续静音和暂停进行合并
如果需要,去掉timestamps
Usage: python normalize_lab_for_merlin.py <input_lab_dir> <output_lab_dir> <label_style> <file_id_list_scp> <optional: write_time_stamps (1/0)>
<input_lab_dir> full标签所在路径
<output_lab_dir> 归一化后标签保存路径
<label_style> 使用何种对齐方式,支持phone_align, state_align
<file_id_list_scp> 标签文件列表所在路径
<optional: write_time_stamps (1/0)> 是否写time stamps (可以省略,默认为1)
注意:上述过程暂时没有使用到mono label信息。
注意 :对于训练数据需要指定label_style>=phone_align
并且置write_time_stamps>=0
对于测试数据,无此要求(推荐:label_style>=stete_align, <write_time_stamps>=1
。
归一化的结果保存在<output_lab_dir>,文件名称和原文件相同。内容如下:
0 2200000 x^x-sil+hh=ax@x_x/A:0_0_0/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:0+0+3/D:0_0/E:x+x@x+x&x+x#x+x/F:content_2/G:0_0/H:x=x@1=1|0/I:3=2/J:3+2-1
2200000 2800000 x^sil-hh+ax=l@1_3/A:0_0_0/B:0-0-3@1-2&1-3#1-3$1-3!0-1;0-1|ax/C:1+1+1/D:0_0/E:content+2@1+2&1+1#0+1/F:content_1/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
2800000 3200000 sil^hh-ax+l=ow@2_2/A:0_0_0/B:0-0-3@1-2&1-3#1-3$1-3!0-1;0-1|ax/C:1+1+1/D:0_0/E:content+2@1+2&1+1#0+1/F:content_1/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
3200000 4000000 hh^ax-l+ow=w@3_1/A:0_0_0/B:0-0-3@1-2&1-3#1-3$1-3!0-1;0-1|ax/C:1+1+1/D:0_0/E:content+2@1+2&1+1#0+1/F:content_1/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
4000000 5500000 ax^l-ow+w=er@1_1/A:0_0_3/B:1-1-1@2-1&2-2#1-2$1-2!0-1;0-1|ow/C:1+1+4/D:0_0/E:content+2@1+2&1+1#0+1/F:content_1/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
5500000 6250000 l^ow-w+er=l@1_4/A:1_1_1/B:1-1-4@1-1&3-1#2-1$2-1!1-0;1-0|er/C:0+0+0/D:content_2/E:content+1@2+1&2+0#1+0/F:0_0/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
6250000 7250000 ow^w-er+l=d@2_3/A:1_1_1/B:1-1-4@1-1&3-1#2-1$2-1!1-0;1-0|er/C:0+0+0/D:content_2/E:content+1@2+1&2+0#1+0/F:0_0/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
7250000 8150000 w^er-l+d=sil@3_2/A:1_1_1/B:1-1-4@1-1&3-1#2-1$2-1!1-0;1-0|er/C:0+0+0/D:content_2/E:content+1@2+1&2+0#1+0/F:0_0/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
8150000 8900000 er^l-d+sil=x@4_1/A:1_1_1/B:1-1-4@1-1&3-1#2-1$2-1!1-0;1-0|er/C:0+0+0/D:content_2/E:content+1@2+1&2+0#1+0/F:0_0/G:0_0/H:3=2@1=1|L-L%/I:0=0/J:3+2-1
8900000 13400000 l^d-sil+x=x@x_x/A:1_1_4/B:x-x-x@x-x&x-x#x-x$x-x!x-x;x-x|x/C:0+0+0/D:content_1/E:x+x@x+x&x+x#x+x/F:0_0/G:3_2/H:x=x@1=1|0/I:0=0/J:3+2-1
归一化之后的标签将输入到forced_alignment.py
,实现对齐。具体如何对齐,我们将在后文介绍。