danielrosehill/chatterbox-tts-benchmark.md

Last active December 7, 2025 13:24

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Select an option

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/danielrosehill/33575cb99efb05df801374ceb828731a.js"></script>
Save danielrosehill/33575cb99efb05df801374ceb828731a to your computer and use it in GitHub Desktop.

Download ZIP

Chatterbox TTS Local Generation Benchmark (AMD ROCm)

Raw

chatterbox-tts-benchmark.md

Chatterbox TTS Generation Benchmark

Date: 2024-12-07
Model: ResembleAI/chatterbox

Test Results Comparison

Local (AMD ROCm)

Metric	Value
Hardware	AMD Radeon RX 7700 XT (12GB VRAM, gfx1101)
Test audio length	2.5 seconds
Generation time	~28 seconds
Real-time factor (RTF)	11.2x

Modal (Cloud GPU - NVIDIA A10G)

Metric	Value
Cold start	~43 seconds (model loading)
Test audio length	5.3 seconds
Generation time (warm)	5.2 seconds
Real-time factor (RTF)	~1x (real-time!)

Extrapolations for 30-Minute Episode

Platform	RTF	Estimated Time
Local (AMD ROCm)	11.2x	~5.6 hours
Modal (warm)	~1x	~30-35 minutes

Modal is ~11x faster than local AMD ROCm.

Observations

Local (ROCm)

ROCm/HIP is working but with workspace memory warnings
Suboptimal execution paths due to memory constraints
ROCm typically 30-50% slower than NVIDIA CUDA

Modal

NVIDIA A10G provides much faster inference
Memory snapshot feature reduces cold starts
~$0.76/hr for A10G (pay per second)
Scales to 10 concurrent requests per container

Cost Estimate (Modal)

For a 30-minute episode generating ~35 minutes of compute:

A10G cost: ~$0.76/hr
Estimated cost per episode: ~$0.45

Comparison with Other TTS Options

Option	Speed (RTF)	Quality	Cost/Episode (30min)
Chatterbox (Local ROCm)	~11x	Excellent	Free (5.6 hrs time)
Chatterbox (Modal)	~1x	Excellent	~$0.45
Edge-TTS	<0.1x	Good	Free
OpenAI TTS	~0.2x	Excellent	~$15-20
ElevenLabs	~0.3x	Best	~$5-22/mo

Conclusion

Modal is the clear winner for Chatterbox TTS:

11x faster than local AMD ROCm
Reasonable cost (~$0.45/episode)
Serverless (no infrastructure to maintain)
Scales automatically

Local ROCm is only viable for:

Very short clips
Overnight batch processing
Zero-cost requirements

Modal Deployment

# Deploy
modal deploy chatterbox_tts.py

# Endpoints
POST /generate - Single TTS segment
POST /episode - Full episode with multiple segments

Environment Details

Local

Container: chatterbox-tts
Image: rocm/pytorch:latest
GPU Override: HSA_OVERRIDE_GFX_VERSION=11.0.1

Modal

GPU: a10g
Concurrency: 10 requests/container
Scaledown: 5 minutes
Memory Snapshot: enabled

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment