Skip to content

Instantly share code, notes, and snippets.

@milljm
Last active April 11, 2025 02:05
Show Gist options
  • Save milljm/52830a77b9017159b29dd52427836277 to your computer and use it in GitHub Desktop.
Save milljm/52830a77b9017159b29dd52427836277 to your computer and use it in GitHub Desktop.
Kokoro txt --> speech
#!/usr/bin/env python3
""" a quick text to speech tool """
import os
import sys
import argparse
import sounddevice as sd
import warnings
warnings.filterwarnings('ignore')
try:
from IPython.display import display, Audio
from kokoro import KPipeline
import soundfile as sf
except ImportError:
print('There were errors importing the necessary libraries.\nBe sure to have the following '
'libraries installed\n(all available via Conda and PIP):\n\n\tkokoro, IPython,'
'soundfile, and sounddevice\n\n')
sys.exit(1)
def verify_args(args):
if not args.input and not args.args:
print('You need to specify an input file I will read, or just throw what you want said as '
'an argument')
sys.exit(1)
return args
def parse_args(argv, piped_input):
""" parses arguments """
parser = argparse.ArgumentParser(description='Using kokoro to generate text to speech')
parser.add_argument('-i', '--input', nargs='?',
help='treated as what to speak')
parser.add_argument('-a', '--accent', nargs='?', default='b',
help='Accent. Choose from: a (american), b (british). Default: b')
parser.add_argument('--voice', nargs='?', default='af_heart',
help='Voice to use. Default: af_heart. '
'Combined voices with: af_heart,af_bella')
parser.add_argument('-s', '--speed', nargs='?', type=float, default=1.0,
help='speed at which to speek')
parser.add_argument('--stream', action='store_const', const=True, default=False,
help='Stream audio directly')
parser.add_argument('args', nargs=argparse.REMAINDER)
args = parser.parse_args(argv)
if piped_input:
args.args = [piped_input]
return verify_args(args)
def txt2voice(args):
pipeline = KPipeline(lang_code=args.accent, repo_id='hexgrad/Kokoro-82M')
if args.input and os.path.exists(args.input):
with open(args.input, 'r', encoding="utf-8") as f:
text = f.read()
elif args.input and not os.path.exists(args.input):
print('path to input file not found or readable')
sys.exit(1)
else:
text = ' '.join(args.args)
generator = pipeline(text, voice=args.voice, speed=args.speed, split_pattern=r'\n+')
for i, (gs, ps, audio) in enumerate(generator):
print(f'Voice:\t{args.voice}\nGraphemes/Text:\n{gs}\n\nPhonemes:\n{ps}\n')
if args.stream:
sd.play(audio, 24000)
sd.wait()
else:
print(f'writing file:\t{i}.wav')
sf.write(f'{i}.wav', audio, 24000)
if __name__ == '__main__':
piped_input = None
if len(sys.argv) > 1:
arguments = sys.argv[1:]
if not sys.stdin.isatty():
piped_input = sys.stdin.read()
args = parse_args(sys.argv[1:], piped_input)
txt2voice(args)
@milljm
Copy link
Author

milljm commented Mar 16, 2025

Follow setup instructions here: https://huggingface.co/hexgrad/Kokoro-82M

Basically, create a new Conda environment, pip install some things, and off we go.

conda create -n txt2sp python=3.12 pip ipython
conda activate txt2sp
pip install kokoro>=0.8.2 soundfile sounddevice
./speak.py --stream "hello handsome"

@milljm
Copy link
Author

milljm commented Mar 16, 2025

❯ cat text.txt | ./speak.py --speed .9 --voice af_heart,af_bella

Voice: af_heart,af_bella
Graphemes/Text:
An Orb Weaver from the Araneidae family. Araneus Marmoreus. Harmless. You've left it alone since discovering it. Countless times have you spent watching it create the most "beautiful" patterns. As luck would happen, sometimes you've witnessed the early morning sun strike the dew ridden web casting a cacophony of colors: You question how such a beautiful thing can be created by such a creature so, commonly, feared.

Phonemes:
ɐn ˈɔːb wˈiːvə fɹɒm ði ˈaɹənˌɪdiː fˈamɪli. ˈaɹənˌiəs mɑːmˈɔːɹɪəs. hˈɑːmləs. jˌuːv lˈɛft ɪt əlˈQn sˈɪns dɪskˈʌvəɹɪŋ ɪt. kˈWntləs tˈImz hav juː spˈɛnt wˈɒʧɪŋ ɪt kɹɪˈAt ðə mˈQst “bjˈuːtɪfʊl” pˈatᵊnz. ˌaz lˈʌk wʊd hˈapᵊn, sˈʌmtImz juːv wˈɪtnɪst ði ˈɜːli mˈɔːnɪŋ sˈʌn stɹˈIk ðə djˈuː ɹˈɪdᵊn wˈɛb kˈɑːstɪŋ ɐ kəkˈɒfəni ɒv kˈʌləz: jˌuː kwˈɛsʧᵊn hˌW sˈʌʧ ɐ bjˈuːtɪfʊl θˈɪŋ kan biː kɹɪˈAtɪd bI sˈʌʧ ɐ kɹˈiːʧə sˌQ, kˈɒmənli, fˈɪəd.

orb_weaver.mp4

@milljm
Copy link
Author

milljm commented Mar 16, 2025

@WilkAndy
haha me taking advantage of AI text to speech for in game dialog.

@milljm
Copy link
Author

milljm commented Mar 16, 2025

Hah, a "cacophony" of colors? No... was looking for a positive word.

❯ cat orb_weaver.txt | ./speak.py --voice af_heart,af_bella --speed .9

Voice: af_heart,af_bella
Graphemes/Text:
An Orb Weaver from the Araneidae family. Araneus Marmoreus. Harmless. You've left it alone since discovering it. Countless times have you spent watching it create the most "beautiful" patterns. As luck would happen, sometimes you've witnessed the early morning sun strike the dew ridden web casting a symphony of colors: You question how such a beautiful thing can be created by such a creature so, commonly, feared.

Phonemes:
ɐn ˈɔːb wˈiːvə fɹɒm ði ˈaɹənˌɪdiː fˈamɪli. ˈaɹənˌiəs mɑːmˈɔːɹɪəs. hˈɑːmləs. jˌuːv lˈɛft ɪt əlˈQn sˈɪns dɪskˈʌvəɹɪŋ ɪt. kˈWntləs tˈImz hav juː spˈɛnt wˈɒʧɪŋ ɪt kɹɪˈAt ðə mˈQst “bjˈuːtɪfʊl” pˈatᵊnz. ˌaz lˈʌk wʊd hˈapᵊn, sˈʌmtImz juːv wˈɪtnɪst ði ˈɜːli mˈɔːnɪŋ sˈʌn stɹˈIk ðə djˈuː ɹˈɪdᵊn wˈɛb kˈɑːstɪŋ ɐ sˈɪmfəni ɒv kˈʌləz: jˌuː kwˈɛsʧᵊn hˌW sˈʌʧ ɐ bjˈuːtɪfʊl θˈɪŋ kan biː kɹɪˈAtɪd bI sˈʌʧ ɐ kɹˈiːʧə sˌQ, kˈɒmənli, fˈɪəd.

orb_weaver-symphony.mp4

@WilkAndy
Copy link

Thanks @milljm - you just gave me an hour of fun!

@WilkAndy
Copy link

I think this is really cool. One failing is that the intonation is not super accurate though, eg "Is this a question" sounds the same as "Is this a question?" (with question mark). Is there a way of marking the text with ? @milljm

@milljm
Copy link
Author

milljm commented Mar 17, 2025

Oh you're right... From everything I've already read it supposed to be simply ending with ?. If I find out how to do it, I'll post again and tag ya.

Edit, pasting what I could find about directly using Phonemes, so I don't have to search for it again:

Customize pronunciation with Markdown link syntax and /slashes/ like [Kokoro](/kˈOkəɹO/)
To adjust intonation, try punctuation ;:,.!?—…"()“” or stress ˈ and ˌ
Lower stress [1 level](-1) or [2 levels](-2)
Raise stress 1 level [or](+2) 2 levels (only works on less stressed, usually short words)

ref: https://huggingface.co/spaces/hexgrad/Kokoro-TTS

 echo '[Kokoro](/kˈOkəɹO/)' | ./speak.py --stream

@milljm
Copy link
Author

milljm commented Mar 17, 2025

@WilkAndy
Yeah that was a bit interesting. I had to begin understanding how to use Phonemes directly.

echo '[How are you?](/hˌW ɑː jˈːːːuˇ?/)' | ./speak.py --stream --speed .8
                               ^^^ ^ upward pitch (or supposedly does, its hard to tell)
                                  \ holding "like shhhh!"

vs what the system does by default:

echo '[How are you?](/hˌW ɑː juː?/)' | ./speak.py --stream --speed .8

ref: https://en.wikipedia.org/wiki/International_Phonetic_Alphabet

@WilkAndy
Copy link

Hmmm, i can see i could spend a long time having fun with the intonation!

echo '[How are you?  How are you?  How are you?](/hˌW ɑː ju?  hˌW ɑː jˈːːːuˇ?  hˌW ɑː jˈːːːu↗︎?/)' | ./speak.py --stream

@milljm
Copy link
Author

milljm commented Mar 17, 2025

Yeah, me too. I think I am going to try and reproduce Amelia Tyler's "Authority" line in BG3 😄

@milljm
Copy link
Author

milljm commented Mar 17, 2025

You should also try mixing some voices. I found --voice af_nicole,af_heart --accent b --speed .8 pretty darn good. Oh, and be sure to try af_nicole by itself! It is a very distinctive voice.

@WilkAndy
Copy link

Yeah, me too. I think I am going to try and reproduce Amelia Tyler's "Authority" line in BG3 😄

That's going to be super hard! She's got such perfect intonation :-)

@milljm
Copy link
Author

milljm commented Mar 17, 2025

hahaha just posting to possibly get ahead of this, and because I don't want to strike a nerve with anyone:

I only mention 'Amelia Tyler' because she is perfect. And I only want to learn the phonetics/intonation process. Might as well pick from the best!

@WilkAndy
Copy link

OK, after 34 minutes i admit defeat. My computer will never turn into Amelia Tyler, boohoo.

@milljm
Copy link
Author

milljm commented Mar 17, 2025

hahah same. The closest I got:

ɔˈθːːˈɒɹɪtˌiː

@WilkAndy
Copy link

Wow, well done! That is actually really good !! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment