This is a bief explanation of how to build a new model for ilive tts.
the input of model generation process will be a set of directories containing required files for model generation, and these folders can be listed as the follwing:
textdir contains set of diactized arabic text sentences each in a separated file.wavdir contains set of wav files each represents arabic pronounciation of corresponding sentence in text dirlabdir contains arabic text pronounciation timelaps for each speech segment for each corresponding text and wav filelanguagedir we create that contain pre-processing outputsphonemizerdir contain set of rules of pronounciation of arabic language phonemes.
the input acquired from linguistics team needs some pre-processing to generate intermediate files needed while generating the model.
- gather all unique words in all scentences in one file and put it in language folder.
# enter text directory
cd text
# put all sentences in one file in language directory
awk 'FNR==1{print ""}1' *.txt > ../language/text_ar.txt
# enter language directory
cd ../language
# get unique words from all sentences and put them in one file
tr -s [:space:] \\n < text_ar.txt | sort | uniq > unique_words.txt
# then manually remove space` ` and `,` if exist in the unique words file- in
labdirectory, in all files, replace allSILandSSILwith_.
# enter lab directory
cd lab
# replace all SSIL with _ in all files
sed -i -- 's/SSIL/_/g' *
# replace all SIL with _ in all files
sed -i -- 's/SIL/_/g' *- in project
phonemizer-continuousresources [src/main/resources/com/univox], replace [allophones.ar_SA.xml,ArabicPhonemesMap,ArabicScript] with files from phonemizer directory, and also in both the jar file and the rar file inside it. - in project
marytts:marytts-runtime/src/main/resourcesandmarytts-runtime/src/main/java/marytts/com/univoxreplace [ArabicPhonemesMap,ArabicScript,allophones.ar_SA.xml] with files from the previos step. - put the unique words file in project
phonemizer-continuousbase folder. - in project
phonemizer-continuous:src/main/java/com/univox/PhonemizerMain.javamake sure that the file name used iis the same as your unique words file name in the base filder.
String filename="unique_words.txt";- run project
phonemizer-continuousas a java application. [this will take a few minutes] - move the output of
phonemizer-continuousproject output namedunique_words.phtolanguagedirectory and name itar.txt. - in
languagefolder, inar.txtfile replace all__withfunctional(yes: with a pre space), then remove all the remaining_from the file.
# enter language folder
cd language
# replace "__" with " functional"
sed -i -- 's/__/ functional/g' ar.txt
# remove remaining '_' in the file
sed 's/_//g' ar.txt- in
maryttsproject, deletetargetfolder, then re-createmarytts
# enter marytts directory
cd marytts
mvn -Dmaven.test.skip=true install- in language folder run
transcription.shwhich results frommaryttsbuild in the previous step.
cd language
transcription.shthis will open a GUI tool that will require few steps:-
- asks for
alophones.ar_SA.xmlfile, select it from its location. - then from
filemenu selectopenand then selectar.txtfile fromlanguagedirectory. - check all words, none should be in red, and if so this indicates and error.
- click
train and predictbutton. - from
filemenu, selectsave. this will result in saving few files inlanguagedirectory. - close the gui tool.
- in
maryttsproject, delete target folder. - in
maryttsproject, in directorymarytts-languages/marytts-lang-ar, delete target folder. - in
maryttsproject, in directorymarytts-languages/marytts-lang-ar/src/main/resources/marytts/language/ar/lexiconreplace the files in it with output files from thetranscription.shtool. - in the directory from the previous step rename
allophones.ar_SA.xmltoallophones.ar.xmland also remove the_SAfrom the taglanginside the file. - in project
marytts: dirmarytts-language/marytts-lang-ar/lib/modules/ar/lexiconrepace the two files [allophones.ar.xml,ar] with modifiedall_phones.ar.xmlfile from the previos step andar.txtfile fromlanguagedirectory afrer being renamed toaronly - in project
marytts: dirmarytts-language/marytts-lang-artest that everything is okay.
mvn test- in project
marytts:diruser-dictionaries, replaceuserdict-ar.txtwith the output of thephonemizer-continuousproject which isunique_words.phafter it being renamed touserdict-ar.txt. - edit the filed moved to
user-dictionaries, replace remove__and replace_with|
sed 's/__//g' userdict-ar.txt
sed 's/_/| /g' userdict-ar.txt- in
maryttsproject, deletetargetfolder, then buildmarytts
# enter marytts directory
cd marytts
mvn -Dmaven.test.skip=true install- open
maryttsprojcet -univox workspace-in eclipse and add build configuration for server and client as the following. - run server: run configuration with
~/git/maryttsas mary base - run DatabaseImportMain: run configuration with db directory as database base folder.