This document outlines the requirements for the Playback API, which provides a unified interface for controlling text-to-speech (TTS) playback.
- Start, pause, resume, and stop
- Handle both individual and batched text/SSML input
- Report current playback state (playing, paused, stopped)
- Accept plain text and SSML input
- Support multiple utterances
- Select from available voices
- Configure voice parameters (rate, pitch, volume)
- Emit events for state changes
- Provide word/sentence boundary information
- Report errors and warnings
Note as a “hidden” requirement that it has to work as a standalone module for web consumers as well, who will not rely on a Navigator and Preferences API.
In Readium Speech we will probably use an
init
where you pass your engines, and you cangetVoices()
after that. But pitch, rate, volume, etc. we discussed this morning they should go into theloadUtterances
somewhat as some TTS engines require this to be set for each utterance otherwise they do not work well.