Skip to content

Instantly share code, notes, and snippets.

@danielrosehill
Last active December 7, 2025 13:24
Show Gist options
  • Select an option

  • Save danielrosehill/33575cb99efb05df801374ceb828731a to your computer and use it in GitHub Desktop.

Select an option

Save danielrosehill/33575cb99efb05df801374ceb828731a to your computer and use it in GitHub Desktop.
Chatterbox TTS Local Generation Benchmark (AMD ROCm)

Chatterbox TTS Generation Benchmark

Date: 2024-12-07
Model: ResembleAI/chatterbox

Test Results Comparison

Local (AMD ROCm)

Metric Value
Hardware AMD Radeon RX 7700 XT (12GB VRAM, gfx1101)
Test audio length 2.5 seconds
Generation time ~28 seconds
Real-time factor (RTF) 11.2x

Modal (Cloud GPU - NVIDIA A10G)

Metric Value
Cold start ~43 seconds (model loading)
Test audio length 5.3 seconds
Generation time (warm) 5.2 seconds
Real-time factor (RTF) ~1x (real-time!)

Extrapolations for 30-Minute Episode

Platform RTF Estimated Time
Local (AMD ROCm) 11.2x ~5.6 hours
Modal (warm) ~1x ~30-35 minutes

Modal is ~11x faster than local AMD ROCm.

Observations

Local (ROCm)

  • ROCm/HIP is working but with workspace memory warnings
  • Suboptimal execution paths due to memory constraints
  • ROCm typically 30-50% slower than NVIDIA CUDA

Modal

  • NVIDIA A10G provides much faster inference
  • Memory snapshot feature reduces cold starts
  • ~$0.76/hr for A10G (pay per second)
  • Scales to 10 concurrent requests per container

Cost Estimate (Modal)

For a 30-minute episode generating ~35 minutes of compute:

  • A10G cost: ~$0.76/hr
  • Estimated cost per episode: ~$0.45

Comparison with Other TTS Options

Option Speed (RTF) Quality Cost/Episode (30min)
Chatterbox (Local ROCm) ~11x Excellent Free (5.6 hrs time)
Chatterbox (Modal) ~1x Excellent ~$0.45
Edge-TTS <0.1x Good Free
OpenAI TTS ~0.2x Excellent ~$15-20
ElevenLabs ~0.3x Best ~$5-22/mo

Conclusion

Modal is the clear winner for Chatterbox TTS:

  • 11x faster than local AMD ROCm
  • Reasonable cost (~$0.45/episode)
  • Serverless (no infrastructure to maintain)
  • Scales automatically

Local ROCm is only viable for:

  • Very short clips
  • Overnight batch processing
  • Zero-cost requirements

Modal Deployment

# Deploy
modal deploy chatterbox_tts.py

# Endpoints
POST /generate - Single TTS segment
POST /episode - Full episode with multiple segments

Environment Details

Local

Container: chatterbox-tts
Image: rocm/pytorch:latest
GPU Override: HSA_OVERRIDE_GFX_VERSION=11.0.1

Modal

GPU: a10g
Concurrency: 10 requests/container
Scaledown: 5 minutes
Memory Snapshot: enabled
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment