Speak to Claude Code and get responses read aloud. Fully local STT (Whisper) and TTS (Kokoro), no API keys needed.
You speak → Whisper transcribes locally → Claude responds → (optional) Kokoro speaks response
Latency: ~0.3s for transcription + normal Claude response time.
- macOS (Apple Silicon recommended — Metal acceleration for Whisper)
- Claude Code CLI installed
- Homebrew
VoiceMode uses uvx to run. Install uv which provides it:
brew install uvVerify: uvx --version
claude mcp add --scope user voicemode -- uvx --refresh --with webrtcvad --with "setuptools<71" voice-modeThis adds the MCP server config to ~/.claude.json.
Why the extra dependencies? VoiceMode uses
webrtcvadfor Voice Activity Detection (silence detection that stops recording when you stop talking). Thewebrtcvadpackage depends onpkg_resourceswhich was removed insetuptools>=71. Without both packages, VAD fails silently and the mic records for the full duration instead of cutting off when you stop speaking.
If you already ran the basic claude mcp add without these flags, edit ~/.claude.json directly:
"voicemode": {
"type": "stdio",
"command": "uvx",
"args": ["--refresh", "--with", "webrtcvad", "--with", "setuptools<71", "voice-mode"],
"env": {}
}Then restart Claude Code.
uvx voice-mode service install whisperThis will:
- Clone and build whisper.cpp locally
- Download the
basemodel (~141MB) with CoreML support - Install to
~/.voicemode/services/whisper/
Skip this if you only want speech input (recommended for lowest latency).
uvx voice-mode service install kokoroThis installs the Kokoro neural TTS engine to ~/.voicemode/services/kokoro/.
uvx voice-mode service start whisper
uvx voice-mode service start kokoro # skip if you didn't install itVerify everything is running:
uvx voice-mode statusYou should see:
- Whisper:
running/healthyon port 2022 - Kokoro:
runningon port 8880 (if installed)
Edit ~/.voicemode/voicemode.env and uncomment/set these values:
# Make silence detection more aggressive (stops recording faster)
VOICEMODE_VAD_AGGRESSIVENESS=3
# Stop recording after 1.5s of silence
VOICEMODE_SILENCE_THRESHOLD_MS=1500
# Safety cap — never record more than 30s
VOICEMODE_DEFAULT_LISTEN_DURATION=30.0
# RECOMMENDED: Disable TTS for much faster responses (text-only output)
VOICEMODE_SKIP_TTS=trueStart (or restart) Claude Code, then:
❯ voice mode
Claude will call the converse MCP tool and start listening to your microphone.
This is the most common issue. The mic records for the full listen_duration_max instead of stopping when you stop talking.
Diagnosis: Check the debug log:
grep VAD_AVAILABLE ~/voicemode_debug.logIf you see VAD_AVAILABLE=False, the webrtcvad library isn't loading.
Fix: Ensure your MCP config in ~/.claude.json includes BOTH dependencies:
"args": ["--refresh", "--with", "webrtcvad", "--with", "setuptools<71", "voice-mode"]The webrtcvad package requires pkg_resources from setuptools, but setuptools>=71 removed it. You need the older version pinned. Restart Claude Code after changing.
Make sure uvx is installed (brew install uv) and restart Claude Code.
-
Check no other process is using the mic:
ps aux | grep -E 'rec |sox ' | grep -v grep
Kill any stray recording processes:
kill -9 <PID> -
Increase VAD aggressiveness in
~/.voicemode/voicemode.env:VOICEMODE_VAD_AGGRESSIVENESS=3
macOS needs to grant mic access to your terminal app. Go to System Settings > Privacy & Security > Microphone and enable your terminal (Terminal.app, iTerm2, Warp, etc.).
You can test mic access with:
brew install sox
rec /tmp/test.wav trim 0 2If prompted, click "Allow".
- Disable TTS: set
VOICEMODE_SKIP_TTS=truein~/.voicemode/voicemode.env - Check Whisper latency directly:
Should be under 0.5s on Apple Silicon.
time curl -s -X POST http://127.0.0.1:2022/v1/audio/transcriptions \ -F "file=@$(brew --prefix whisper-cpp)/share/whisper-cpp/jfk.wav" \ -F "model=base"
uvx voice-mode service logs whisper
uvx voice-mode service logs kokoroSet VOICEMODE_SKIP_TTS=false in ~/.voicemode/voicemode.env. No restart needed — takes effect on next converse call.
uvx voice-mode config get voicesDefault voice is af_sky. Kokoro supports many voices — see Kokoro docs.
| Command | What it does |
|---|---|
uvx voice-mode status |
Show all service status |
uvx voice-mode service start whisper |
Start STT |
uvx voice-mode service start kokoro |
Start TTS |
uvx voice-mode service stop whisper |
Stop STT |
uvx voice-mode service health whisper |
Check if Whisper is responding |
uvx voice-mode service health kokoro |
Check if Kokoro is responding |
┌─────────────┐ ┌──────────────────┐ ┌────────────┐ ┌─────────────┐
│ Microphone │────▶│ Whisper (local) │────▶│ Claude Code│────▶│ Kokoro TTS │
│ via VoiceMode│ │ port 2022 │ │ via MCP │ │ port 8880 │
│ VAD recording│ │ ~0.3s latency │ │ │ │ (optional) │
└─────────────┘ └──────────────────┘ └────────────┘ └─────────────┘
Everything runs locally. No audio leaves your machine.