Skip to content

Instantly share code, notes, and snippets.

@hakanai
Created April 1, 2017 12:50
Show Gist options
  • Select an option

  • Save hakanai/5f939916d0037dc5297df6d429649f53 to your computer and use it in GitHub Desktop.

Select an option

Save hakanai/5f939916d0037dc5297df6d429649f53 to your computer and use it in GitHub Desktop.
Open-JTalk setup with Miku voice samples

Open-JTalk setup with Miku voice samples

A large amount of this was taken from this guide.

Open-JTalk install

Someone has made a Homebrew formula for this, so it was fairly painless.

$ brew install open_jtalk

This also includes the NAIST Japanese Dictionary, which is much better than any of the mecab ones available in Homebrew.

Miku voice install

Followed link to here to get voices:

Downloaded everything. Unzipped everything. Manually fixed directory names because they didn't use UTF-8 for the filenames, and restructured all the directories to look similar for later convenience.

Followed link to here to get the conversion program:

Downloaded htsvconv002.zip. It's a zip bomb, so carefully unzip it into a new directory.

$ mkdir htsvconv
$ cd htsvconv
$ unzip ../htsvconv002.zip

$ brew install mono
$ mcs htsvconv.cs

Thinking back, running mcs may not actually be necessary, since surely the .exe was built cross-platform, but I'm not sure.

Running the conversion for all voices...

$ for f in 'Voice TYPE-α' 'Voice TYPE-β' 'Voice TYPE-γ ver1' 'Voice TYPE-δ ver1'; do \
    (cd "$f"; mono ../htsvconv/htsvconv.exe Voice) ; \
  done

Running it

echo 'よろしくお願いします' | \
  open_jtalk \
    -x /usr/local/Cellar/open-jtalk/1.10_1/dic \
    -m 'Voice TYPE-α/Voice.htsvoice' \
    -ow temp.wav && play -q temp.wav

Useful options:

  • -r sets the rate. For the α voice, 1.1 feels about right. For the β voice, 1.0 feels about right. To me anyway.
  • -fm adjusts the intonation pitch a little.
  • -s is probably useful to select the quality of the resulting WAV file, which might avoid a conversion later on.

Caveats:

  • The dictionary has some English words, but far from all. It seems like I would just have to add more entries to the dictionary.
  • Like with any TTS, the intonation is slightly off.
@hakanai

hakanai commented Apr 22, 2017

Copy link
Copy Markdown
Author

Perhaps interesting too, but MaryTTS can also use HMM for its voices, and if you look at MaryTTS voices vs the voices used here as input data, there is some overlap in the files.

A MaryTTS voice:

  • cmu_us_arctic_slt_b0487.pfeats
  • dur.pdf
  • gv-lf0.pdf
  • gv-mgc.pdf
  • gv-str.pdf
  • lf0.pdf
  • mgc.pdf
  • mix_excitation_filters.txt
  • str.pdf
  • tree-dur.inf
  • tree-lf0.inf
  • tree-mgc.inf
  • tree-str.inf
  • trickyPhones.txt
  • voice.config

Present in Voice TYPE-α:

  • dur.pdf
  • gv-lf0.pdf
  • gv-mgc.pdf
  • gv-switch.inf
  • lf0.pdf
  • lf0.win1
  • lf0.win2
  • lf0.win3
  • lpf.pdf
  • lpf.win1
  • mgc.pdf
  • mgc.win1
  • mgc.win2
  • mgc.win3
  • tree-dur.inf
  • tree-gv-lf0.inf
  • tree-gv-mgc.inf
  • tree-lf0.inf
  • tree-lpf.inf
  • tree-mgc.inf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment