Skip to content

Instantly share code, notes, and snippets.

@JayPanoz
Created August 22, 2025 11:14
Show Gist options
  • Save JayPanoz/24dbad1bbf284124d43394ef66b2cbb5 to your computer and use it in GitHub Desktop.
Save JayPanoz/24dbad1bbf284124d43394ef66b2cbb5 to your computer and use it in GitHub Desktop.
Readium – Playback API
export interface IPlaybackAPI {
// Playback Controls
play(utteranceIndex?: number): void;
pause(): void;
resume(): void;
stop(): void;
// Queue Management
loadUtterances(utterances: string): void;
getCurrentUtteranceIndex(): number;
getUtteranceCount(): number;
// State
getState(): "stopped" | "playing" | "paused";
// Voice Configuration
getVoices(): Promise<SpeechSynthesisVoice[]>;
setVoice(voiceURI: string): void;
setRate(rate: number): void;
setPitch(pitch: number): void;
setVolume(volume: number): void;
// Events
on(event: "playbackStart", callback: (index: number, utterance: GuidedNavigationText) => void): void;
on(event: "playbackPause", callback: () => void): void;
on(event: "playbackStop", callback: () => void): void;
on(event: "playbackEnd", callback: () => void): void;
on(event: "error", callback: (error: Error) => void): void;
on(event: "boundary", callback: (charIndex: number, charLength: number) => void): void;
on(event: "voiceschanged", callback: () => void): void;
// Cleanup
clear(): void;
}

Playback API - Requirements

This document outlines the requirements for the Playback API, which provides a unified interface for controlling text-to-speech (TTS) playback.

1.1 Playback Control

  • Start, pause, resume, and stop
  • Handle both individual and batched text/SSML input
  • Report current playback state (playing, paused, stopped)

1.2 Text Processing

  • Accept plain text and SSML input
  • Support multiple utterances

1.3 Voice Management

  • Select from available voices
  • Configure voice parameters (rate, pitch, volume)

1.4 Event System

  • Emit events for state changes
  • Provide word/sentence boundary information
  • Report errors and warnings
@JayPanoz
Copy link
Author

Note as a “hidden” requirement that it has to work as a standalone module for web consumers as well, who will not rely on a Navigator and Preferences API.

In Readium Speech we will probably use an init where you pass your engines, and you can getVoices() after that. But pitch, rate, volume, etc. we discussed this morning they should go into the loadUtterances somewhat as some TTS engines require this to be set for each utterance otherwise they do not work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment