This method outlines a scalable and efficient approach to generating fully offline, real-time text-to-speech (TTS) voices, particularly in low-resource languages or dialects, using no real recordings of the target speaker. It combines donor TTS, voice conversion, and lightweight TTS model training to synthesize high-quality, personalized voices that can run on edge devices like the Raspberry Pi 4.
This proposal introduces an Elo-style rating system to benchmark OVOS intent pipeline configurations using real user utterances and human-in-the-loop feedback.
Instead of traditional dataset validation, users are presented with predictions from two different intent pipeline configurations and asked to judge which one is more accurate. This approach allows us to:
- Benchmark pipelines using real-world utterances.
- Collect high-quality, user-validated labeled data.
skill: weather
intent: current_weather
Handle weather requests for right now
Examples:
"What's the weather like?"
Nice and simple, lightweight baseline implementations for various things. includes some toy examples
- https://github.com/TigreGotico/anon_requests - anonymous python requests (proxies and tor)
- https://huggingface.co/datasets/Jarbas/locallingua_pt - Recordings from Portugal scrapped from https://localingual.com
- https://huggingface.co/datasets/Jarbas/pt_basics - phonetically diverse standalone words, letters, diphtongs and basic greetings, scrapped from https://www.learningportuguese.co.uk/guide/compare-accents
- https://huggingface.co/datasets/Jarbas/compare-accents-pt - small dataset of multiple portuguese speakers from various dialects speaking the same sentence, scrapped from https://www.learningportuguese.co.uk/guide/compare-accents
- https://huggingface.co/datasets/Jarbas/VocativesEuropeanPortuguese - mirror from dataset of https://www.clul.ulisboa.pt/en/recurso/vocatives-european-portuguese
- https://huggingface.co/datasets/Jarbas/InstitutoCamoes - mirror dataset of https://www.instituto-camoes.pt
- https://huggingface.co/datasets/Jarbas/SpokenPortugueseGeographicalSocialVarieties - mirror dataset of https://www.clul.ulisboa.pt/en/recurso/spoken-portuguese-geographical-and-socia
import requests | |
TRIPLE_VALIDATION_PROMPT = """ | |
You are a triple validator for a personal knowledge graph. | |
Given an utterance that a user spoke to a voice assistant and a candidate triple, your task is to validate the triple | |
Utterances about the user usually have the form of "I am ...." or "My ..." |
Emoji | Task Type | Flow Example |
---|---|---|
🔊 TTS | Text-to-Speech | text (EN) → audio (EN) |
🌐 T2TT 🌐 MT |
Text-to-Text-Translation Machine Translation |
text (FR) → text (EN) |
🎤 T2ST | Text-to-Speech-Translation | text (DE) → audio (EN) |
🗣️ STT 🗣️ ASR |
Speech-to-Text Automatic-speech-recognition |
audio (PT) → text (PT) |
🗣️📝 S2TT | Speech-to-Text-Translation | audio (FR) → text (EN) |
🗣️🔄 S2ST | Speech-to-Speech-Translation | audio (ES) → audio (EN) |
Welcome to the quick-start guide for installing Open Voice OS (OVOS) using the official ovos-installer
! This guide is suitable for Raspberry Pi and desktop/server Linux environments. Whether you’re running this on a headless Raspberry Pi or your everyday laptop, the steps are mostly the same—only the way you connect to the device differs.
⚠️ Note: Some “exotic” hardware (like ReSpeaker microphones or certain audio HATs) may require extra configuration. The installer aims for wide compatibility, but specialized setups might need some manual intervention.
Looking for a pre-built raspberry pi image instead? check out raspOVOS and the companion tutorial
import requests | |
import argparse | |
from datetime import datetime | |
# Function to get the PyPI release data | |
def get_versions(package_name): | |
url = f'https://pypi.org/pypi/{package_name}/json' | |
response = requests.get(url) | |
if response.status_code == 200: |
> what are the STT options
There are several Speech-To-Text (STT) options you can consider for OpenVoiceOS. Some popular STT plugins include:
1. FasterWhisper - A fast option that can be run locally on a machine with sufficient resources.
2. VOSK - Works well for offline needs.
3. Google Translate STT - A cloud-based solution with good accuracy.
4. Deepgram - Another cloud-based option that provides a variety of features.