JarbasAI JarbasAl

Synthetic Voice Bootstrapping for Offline TTS via Donor Speech and Voice Conversion

Overview

This method outlines a scalable and efficient approach to generating fully offline, real-time text-to-speech (TTS) voices, particularly in low-resource languages or dialects, using no real recordings of the target speaker. It combines donor TTS, voice conversion, and lightweight TTS model training to synthesize high-quality, personalized voices that can run on edge devices like the Raspberry Pi 4.

Motivation

Intent Pipeline Elo Benchmarking System

Overview

This proposal introduces an Elo-style rating system to benchmark OVOS intent pipeline configurations using real user utterances and human-in-the-loop feedback.

Instead of traditional dataset validation, users are presented with predictions from two different intent pipeline configurations and asked to judge which one is more accurate. This approach allows us to:

Benchmark pipelines using real-world utterances.
Collect high-quality, user-validated labeled data.

skill: weather

intent: current_weather

Handle weather requests for right now

Examples:

"What's the weather like?"

Nice and simple, lightweight baseline implementations for various things. includes some toy examples

Web

https://github.com/TigreGotico/anon_requests - anonymous python requests (proxies and tor)

Database

https://github.com/TigreGotico/json_database

Interactive Fiction

Audio

https://huggingface.co/datasets/Jarbas/locallingua_pt - Recordings from Portugal scrapped from https://localingual.com
https://huggingface.co/datasets/Jarbas/pt_basics - phonetically diverse standalone words, letters, diphtongs and basic greetings, scrapped from https://www.learningportuguese.co.uk/guide/compare-accents
https://huggingface.co/datasets/Jarbas/compare-accents-pt - small dataset of multiple portuguese speakers from various dialects speaking the same sentence, scrapped from https://www.learningportuguese.co.uk/guide/compare-accents
https://huggingface.co/datasets/Jarbas/VocativesEuropeanPortuguese - mirror from dataset of https://www.clul.ulisboa.pt/en/recurso/vocatives-european-portuguese
https://huggingface.co/datasets/Jarbas/InstitutoCamoes - mirror dataset of https://www.instituto-camoes.pt
https://huggingface.co/datasets/Jarbas/SpokenPortugueseGeographicalSocialVarieties - mirror dataset of https://www.clul.ulisboa.pt/en/recurso/spoken-portuguese-geographical-and-socia

Tasks

Emoji	Task Type	Flow Example
🔊 TTS	Text-to-Speech	text (EN) → audio (EN)
🌐 T2TT 🌐 MT	Text-to-Text-Translation Machine Translation	text (FR) → text (EN)
🎤 T2ST	Text-to-Speech-Translation	text (DE) → audio (EN)
🗣️ STT 🗣️ ASR	Speech-to-Text Automatic-speech-recognition	audio (PT) → text (PT)
🗣️📝 S2TT	Speech-to-Text-Translation	audio (FR) → text (EN)
🗣️🔄 S2ST	Speech-to-Speech-Translation	audio (ES) → audio (EN)

How to Install Open Voice OS with the `ovos-installer`

Welcome to the quick-start guide for installing Open Voice OS (OVOS) using the official ovos-installer! This guide is suitable for Raspberry Pi and desktop/server Linux environments. Whether you’re running this on a headless Raspberry Pi or your everyday laptop, the steps are mostly the same—only the way you connect to the device differs.

⚠️ Note: Some “exotic” hardware (like ReSpeaker microphones or certain audio HATs) may require extra configuration. The installer aims for wide compatibility, but specialized setups might need some manual intervention.

Looking for a pre-built raspberry pi image instead? check out raspOVOS and the companion tutorial

> what are the STT options

There are several Speech-To-Text (STT) options you can consider for OpenVoiceOS. Some popular STT plugins include:

1. FasterWhisper - A fast option that can be run locally on a machine with sufficient resources.
2. VOSK - Works well for offline needs.
3. Google Translate STT - A cloud-based solution with good accuracy.
4. Deepgram - Another cloud-based option that provides a variety of features.

	import requests


	TRIPLE_VALIDATION_PROMPT = """
	You are a triple validator for a personal knowledge graph.

	Given an utterance that a user spoke to a voice assistant and a candidate triple, your task is to validate the triple

	Utterances about the user usually have the form of "I am ...." or "My ..."

	import requests
	import argparse
	from datetime import datetime


	# Function to get the PyPI release data
	def get_versions(package_name):
	url = f'https://pypi.org/pypi/{package_name}/json'
	response = requests.get(url)
	if response.status_code == 200:

JarbasAI JarbasAl

Synthetic Voice Bootstrapping for Offline TTS via Donor Speech and Voice Conversion

Overview

Motivation

Intent Pipeline Elo Benchmarking System

Overview

Web

Database

Interactive Fiction

Audio

Tasks

How to Install Open Voice OS with the ovos-installer

How to Install Open Voice OS with the `ovos-installer`