Speech to Text
The original Whisper model is a good speech to text transcription model which is used in many places: https://huggingface.co/openai/whisper-large-v3
Text to Speech
WhisperSpeech is a good text to speech model with voice cloning: https://huggingface.co/WhisperSpeech/WhisperSpeech that uses an MIT license (unlike coqui and suno). It isn't the "best" model, but for its size it is very, very good.
Other alternatives is a purely ONNX driven model as sponsored by txtai: https://huggingface.co/NeuML/ljspeech-jets-onnx