This document outlines the requirements for the Playback API, which provides a unified interface for controlling text-to-speech (TTS) playback.
- Start, pause, resume, and stop
- Handle both individual and batched text/SSML input
- Report current playback state (playing, paused, stopped)
- Accept plain text and SSML input
- Support multiple utterances
- Select from available voices
- Configure voice parameters (rate, pitch, volume)
- Emit events for state changes
- Provide word/sentence boundary information
- Report errors and warnings
I'm not sure these APIs are useful for the caller. The list of utterances is useful for preloading/context purposes but in my opinion the source of truth for the currently played utterance should be in the navigator.
On Mobile I think we'll use a Preferences API for these settings, so they can be exposed by the navigator.