My random collection of notes on AI voice cloning services/models/techniques/etc. Just because something is listed here, doesn't necessarily mean I have tried it, nor endorse it. Use this as a starting point for doing your own further research.
- https://elevenlabs.io/
- https://elevenlabs.io/speech-synthesis
-
The First Generative Speech Synthesis Platform Generate lifelike speech in any language and voice with the most powerful Text to Speech and Voice Cloning software.
-
- https://elevenlabs.io/voice-lab
-
Generative Voice AI Clone your voice or create entirely new synthetic voices using the most advanced Generative AI technology ever.
-
- https://elevenlabs.io/voice-library
-
Discover AI Voices Crafted by the Community Get access to an ever-growing library of high quality AI voices and discover characters that perfectly fit your needs.
-
- https://elevenlabs.io/professional-voice-cloning
-
Professional Voice Cloning Create the perfect digital replica of your voice using the most advanced voice cloning AI. We create AI models on your voice from the grounds up to offer the most realistic voice cloning experience ever.
-
- https://elevenlabs.io/projects
-
Your Audiobook Workshop Generate, edit, and customize long-form spoken audio with precision, all within a streamlined workflow.
-
- https://elevenlabs.io/speech-synthesis
- https://create.musicfy.lol/
- https://lalals.com/voices/
- https://www.uberduck.ai/
- https://voice.ai/
- https://www.myvocal.ai/
- https://app.kits.ai/
- https://play.ht/voice-cloning/
- https://speechify.com/voice-cloning/
- RVC v2 AI Cover Guide (by kalomaze)
- Training RVC v2 models Guide (by kalomaze)
- RVC Models Archive Sheet
- https://github.com/myshell-ai/OpenVoice
-
Instant voice cloning by MyShell
- https://research.myshell.ai/open-voice
-
OpenVoice: Versatile Instant Voice Cloning
-
- https://arxiv.org/abs/2312.01479
-
OpenVoice: Versatile Instant Voice Cloning
-
We introduce OpenVoice, a versatile voice cloning approach that requires only a short audio clip from the reference speaker to replicate their voice and generate speech in multiple languages. OpenVoice represents a significant advancement in addressing the following open challenges in the field: 1) Flexible Voice Style Control. OpenVoice enables granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, in addition to replicating the tone color of the reference speaker. The voice styles are not directly copied from and constrained by the style of the reference speaker. Previous approaches lacked the ability to flexibly manipulate voice styles after cloning. 2) Zero-Shot Cross-Lingual Voice Cloning. OpenVoice achieves zero-shot cross-lingual voice cloning for languages not included in the massive-speaker training set. Unlike previous approaches, which typically require extensive massive-speaker multi-lingual (MSML) dataset for all languages, OpenVoice can clone voices into a new language without any massive-speaker training data for that language. OpenVoice is also computationally efficient, costing tens of times less than commercially available APIs that offer even inferior performance. To foster further research in the field, we have made the source code and trained model publicly accessible. We also provide qualitative results in our demo website. Prior to its public release, our internal version of OpenVoice was used tens of millions of times by users worldwide between May and October 2023, serving as the backend of MyShell.
-
- https://github.com/camenduru/OpenVoice-colab
-
OpenVoice-colab
-
-
- https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
-
Retrieval-based-Voice-Conversion-WebUI
-
- https://github.com/SociallyIneptWeeb/AICoverGen
-
A WebUI to create song covers with any RVC v2 trained AI voice from YouTube videos or audio files.
-
- https://github.com/w-okada/voice-changer
-
Realtime Voice Changer
-
- https://github.com/yxlllc/DDSP-SVC
-
Real-time end-to-end singing voice conversion system based on DDSP (Differentiable Digital Signal Processing)
-
- https://github.com/isletennos/MMVC_Trainer
-
Real-time voice changer using AI (Trainer)
-
- https://github.com/svc-develop-team/so-vits-svc
-
SoftVC VITS Singing Voice Conversion [Archived]
-
- https://github.com/voicepaw/so-vits-svc-fork
-
so-vits-svc
fork with realtime support, improved interface and more features
-
- https://github.com/PlayVoice/so-vits-svc-5.0
-
so-vits-svc-5.0
-
Core Engine of Singing Voice Conversion & Singing Voice Clone
-
Variational Inference with adversarial learning for end-to-end Singing Voice Conversion based on VITS
- https://github.com/PlayVoice/so-vits-svc-5.0#data-set
- https://github.com/PlayVoice/so-vits-svc-5.0#code-sources-and-references
-
- https://github.com/PlayVoice/lora-svc
-
lora-svc
-
Singing voice change based on whisper, and lora for singing voice clone
-
Singing Voice Conversion based on Whisper & neural source-filter BigVGAN
-
LoRA is not fully implemented in this project, but it can be found here: LoRA TTS & paper
-
- https://github.com/PlayVoice/VI-SVS
-
VI-SVS
-
Singing Voice Synthesis based on VITS, different from VISinger
-
Variational Inference with adversarial learning for end-to-end Singing Voice Synthesis
-
Different from VISinger, It is just VITS without MAS and DurationPredictor.
-
- https://github.com/quickvc/QuickVC-VoiceConversion
-
QuickVC: Any-to-many Voice Conversion Using Inverse Short-time Fourier Transform for Faster Conversion
-
- https://github.com/auspicious3000/contentvec
-
ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers
- https://arxiv.org/abs/2204.09224
-
- https://github.com/uberduck-ai/uberduck-ml-dev
-
Uberduck Synthetic Speech
-
ML models for Uberduck
-
Bowser (Jack Black) (Super Mario Bros.) (RVC V2) 950 Epochs
40k, crepe: https://huggingface.co/yeey5/BowserRVCV2/resolve/main/Bowser.zip 48k, mangio-crepe: https://huggingface.co/yeey5/BowserRVCV2/resolve/main/Bowser48k.zip
(Ref)
Princess Peach (Anya Taylor Joy) RVC v2 300 Epochs
I decided to create a voice model of her because there's never been a voice model of any taylor joy
https://app.kits.ai/convert/shared/anya-taylor-joy-1
(Ref)
Ellie TLOU RVC V2 300 Epochs
Ellie from the last of us part 1 (RVC V2) 300 epochs trained with a 10 minute dataset , voice lines from the game
https://huggingface.co/TJKAI/EllieLOU/resolve/main/Ellie.rar
(Ref)
Ellie (The Last of Us Part 1) (RVC v2, 119e/8806s)
This model is based on the younger version of Ellie from The Last of Us (video game). This model only exists because I failed to appreciate how different Ellie sounds at different parts of Part 2, but I thought I'd share it anyway because it sounds pretty good. There are other young Ellie voices that I think are better though. You should probably use this one by <@554576184101306368> instead: https://discord.com/channels/1089076875999072296/1128416105229189180/1128416105229189180 Here's my model of the older Ellie from Part 2: https://discord.com/channels/1089076875999072296/1136823620618964992/1136823620618964992
https://huggingface.co/Grimoire-VC/Voices/resolve/main/EllieP1.zip
(Ref)
Ellie (The Last of Us Part 2) (RVC v2, 203e/14210s)
This model is based on the older version of Ellie from the Last of Us Part 2. I'm pretty happy with it. Ellie is usually pretty breathy and I wasn't able to capture that, but I do think I got her tone of voice about right. I don't think there's another Part 2 Ellie on here yet. I also made a Part 1 Ellie that I'm uploading at the same time here: https://discord.com/channels/1089076875999072296/1136823365605261372/1136823365605261372
https://huggingface.co/Grimoire-VC/Voices/resolve/main/EllieP2.zip
(Ref)
Voice samples here: https://leagueoflegends.fandom.com/wiki/Zoe/LoL/Audio
(Ref)
Voice samples here: https://leagueoflegends.fandom.com/wiki/Annie/LoL/Audio
(Ref)
A few sources of sound clips I found from a quick google:
- https://www.soundboard.com/sb/VanellopeVonSchweetz
- https://www.sounds-resource.com/pc_computer/disneyinfinity/sound/12086/
- https://www.sounds-resource.com/ds_dsi/wreckitralph/sound/3249/
- https://www.youtube.com/watch?v=jnp2ObnVt9M
(Ref)
- Music APIs and DBs (0xdevalias' gist)
- Singing Voice Synthesizers (eg. Vocaloid, etc) (0xdevalias' gist)
- Audio Pitch Correction (eg. autotune, melodyne, etc) (0xdevalias' gist)
- Automated Audio Transcription (AAT) / Automated Music Transcription (AMT) (aka: converting audio to midi) (0xdevalias' gist)
- Generating Synth Patches with AI (0xdevalias' gist)
- Compare/Diff Audio Files (0xdevalias' gist)
- Working Around FLStudio Trial Limitations (0xdevalias' gist)