A large amount of this was taken from this guide.
Someone has made a Homebrew formula for this, so it was fairly painless.
$ brew install open_jtalk
This also includes the NAIST Japanese Dictionary, which is much better than any of the mecab ones available in Homebrew.
Followed link to here to get voices:
Downloaded everything. Unzipped everything. Manually fixed directory names because they didn't use UTF-8 for the filenames, and restructured all the directories to look similar for later convenience.
Followed link to here to get the conversion program:
Downloaded htsvconv002.zip. It's a zip bomb, so carefully unzip it into a new directory.
$ mkdir htsvconv
$ cd htsvconv
$ unzip ../htsvconv002.zip
$ brew install mono
$ mcs htsvconv.cs
Thinking back, running mcs may not actually be necessary, since surely the .exe was built cross-platform, but I'm not sure.
Running the conversion for all voices...
$ for f in 'Voice TYPE-α' 'Voice TYPE-β' 'Voice TYPE-γ ver1' 'Voice TYPE-δ ver1'; do \
(cd "$f"; mono ../htsvconv/htsvconv.exe Voice) ; \
done
echo 'よろしくお願いします' | \
open_jtalk \
-x /usr/local/Cellar/open-jtalk/1.10_1/dic \
-m 'Voice TYPE-α/Voice.htsvoice' \
-ow temp.wav && play -q temp.wav
Useful options:
-rsets the rate. For the α voice, 1.1 feels about right. For the β voice, 1.0 feels about right. To me anyway.-fmadjusts the intonation pitch a little.-sis probably useful to select the quality of the resulting WAV file, which might avoid a conversion later on.
Caveats:
- The dictionary has some English words, but far from all. It seems like I would just have to add more entries to the dictionary.
- Like with any TTS, the intonation is slightly off.
Perhaps interesting too, but MaryTTS can also use HMM for its voices, and if you look at MaryTTS voices vs the voices used here as input data, there is some overlap in the files.
A MaryTTS voice:
Present in Voice TYPE-α: