tasks.md

Tasks

Emoji	Task Type	Flow Example
🔊 TTS	Text-to-Speech	text (EN) → audio (EN)
🌐 T2TT 🌐 MT	Text-to-Text-Translation Machine Translation	text (FR) → text (EN)
🎤 T2ST	Text-to-Speech-Translation	text (DE) → audio (EN)
🗣️ STT 🗣️ ASR	Speech-to-Text Automatic-speech-recognition	audio (PT) → text (PT)
🗣️📝 S2TT	Speech-to-Text-Translation	audio (FR) → text (EN)
🗣️🔄 S2ST	Speech-to-Speech-Translation	audio (ES) → audio (EN)
🧠 SLI	Spoken-Language-Identification	audio (unknown) → lang: "pt"
🧾 WLI	Written-Language-Identification	text (unknown) → lang: "en"
🎯 WW	Wake-Word Detection	passive audio → hotword → trigger
🎚️ VAD	Voice-Activity-Detection	audio stream → speech segmenting
🤖 QA	Question-Answering	text (prompt) → text (generated)
✍️ DT	Dialog-Transformer	text (generated) → text (modified)
✍️ UT	Utterance-Transformer	text (prompt) → text (modified)

task	input type	output type	input language == output language
🤖 Question Answering	text	text	yes
✍️ Dialog Transformer	text	text	yes
🌐 Text-to-text-translation (MT)	text	text	no
🔊 Text-to-speech (TTS)	text	audio	yes
🎤 Text-to-speech-translation (T2ST)	text	audio	no
🗣️ Speech-To-Text (STT)	audio	text	yes
🗣️🔄 Speech-to-speech-translation (S2ST)	audio	audio	no
🗣️📝 Speech-to-text-translation (S2TT)	audio	text	no

Plugin combinations

Combined Plugins	Task Description	Emoji Task
🌐 MT + 🔊 TTS	Text-to-Speech Translation	🎤 T2ST
🗣️ STT + 🌐 MT	Speech-to-Text Translation	🗣️📝 S2TT
🗣️ STT + 🌐 MT + 🔊 TTS	Speech-to-Speech Translation	🗣️🔄 S2ST

🎤 Text → Speech Translation (T2ST)

text (DE)
→ 🧾 detect written lang (DE)
→ 🌐 translate (DE → EN)
→ 🔊 TTS (EN)
= audio (EN)

🗣️📝 Speech → Text Translation (S2TT)

audio (FR)
→ 🧠 detect spoken lang (FR)
→ 🗣️ STT (FR)
→ 🌐 translate (FR → EN)
= text (EN)

🗣️🔄 Speech → Speech Translation (S2ST)

audio (ES)
→ 🧠 detect spoken lang (ES)
→ 🗣️ STT (ES)
→ 🌐 translate (ES → EN)
→ 🔊 TTS (EN)
= audio (EN)

Language Detection

If input language is not known before inference it can be detected via SLI and WLI plugins allowing for dynamic language/multi-user/multilingual setups

Combined Plugins	Task Description	Emoji Task
🧾 + 🔊 TTS	Text-to-Speech	🔊 TTS (multilingual)
🧾 + 🌐 MT	Text Translation	🌐 MT (multilingual)
🧾 + 🌐 MT + 🔊 TTS	Text-to-Speech Translation	🎤 T2ST (multilingual)
🧠 SLI + 🗣️ STT	Speech-to-Text	🗣️ STT (multilingual)
🧠 SLI + 🗣️ STT + 🌐 MT	Speech-to-Text Translation	🗣️📝 S2TT (multilingual)
🧠 SLI + 🗣️ STT + 🌐 MT + 🔊 TTS	Speech-to-Speech Translation	🗣️🔄 S2ST (multilingual)

Generative AI / Persona

Plugin	Purpose	Position
✍️ UT	Normalize / rewrite user input (ex: “can u pls tell me?” → “please tell me”)	Before 🤖 QA
🤖 QA	Core NLU + response generation (LLM / skill selection / intent)	Middle
✍️ DT	Rewrite generated response (ex: dry → humorous, formal → friendly)	After 🤖 QA

for OVOS purposes consider 🤖 QA to be equivalent to ovos-core, in OVOS this step uses intent classification to select a skill that is responsible for executing some action and generating a dialog

✍️ DT is used after ovos-core generated an answer to rewrite it and give it a personality

🎯 Voice Pipeline: STT + Generation + TTS (Variants)

Pipeline	Description
audio → 🎯 WW + 🎚️ VAD + 🗣️ STT → 🤖 QA → 🔊 TTS → audio	Direct spoken Q&A
audio → 🎯 WW + 🎚️ VAD + 🗣️ STT → ✍️ UT → 🤖 QA → 🔊 TTS → audio	Input cleanup for better NLU
audio → 🎯 WW + 🎚️ VAD + 🗣️ STT → ✍️ UT → 🤖 QA → ✍️ DT → 🔊 TTS → audio	Voice assistant with emotion/tone control
audio → 🎯 WW + 🎚️ VAD + 🧠 SLI + 🗣️ STT → 🤖 QA → 🔊 TTS	Multilingual support
audio → 🎯 WW + 🎚️ VAD + 🧠 SLI + 🗣️ STT + 🌐 MT + ✍️ UT → 🤖 QA → ✍️ DT + 🌐 MT+ 🔊 TTS	Fully featured multilingual, polyglot (bidirectional translation), personalized voice assistant

🤖 Generative AI Task Flow Examples

Voice Input → Answer (Monolingual)

audio (EN)
→ 🎯 WW + 🎚️ VAD
→ 🗣️ STT (EN)
→ 🤖 QA (EN → EN)
→ 🔊 TTS (EN)
= audio (EN)

Voice Input → Persona Reply (Multilingual)

audio (PT)
→ 🎯 WW + 🎚️ VAD
→ 🧠 SLI: lang="pt"
→ 🗣️ STT (PT)
→ ✍️ UT (normalize)
→ 🌐 MT (PT → EN)
→ 🤖 QA (EN → EN)
→ ✍️ DT (personality/style)
→ 🌐 MT (EN → PT)
→ 🔊 TTS (PT)
= audio (PT)

JarbasAl/tasks.md

Select an option

No results found