Skip to content

Instantly share code, notes, and snippets.

@jlmalone
Last active February 11, 2026 21:55
Show Gist options
  • Select an option

  • Save jlmalone/02d09aeb4e09890a8a9e7c2333a18377 to your computer and use it in GitHub Desktop.

Select an option

Save jlmalone/02d09aeb4e09890a8a9e7c2333a18377 to your computer and use it in GitHub Desktop.
Voice Mode for Claude Code — Complete Setup Guide (macOS). Local Whisper STT + Kokoro TTS via VoiceMode MCP.

Voice Mode for Claude Code — Complete Setup Guide (macOS)

Speak to Claude Code and get responses read aloud. Fully local STT (Whisper) and TTS (Kokoro), no API keys needed.

What You Get

You speak → Whisper transcribes locally → Claude responds → (optional) Kokoro speaks response

Latency: ~0.3s for transcription + normal Claude response time.

Prerequisites

  • macOS (Apple Silicon recommended — Metal acceleration for Whisper)
  • Claude Code CLI installed
  • Homebrew

Step 1: Install uv (Python package runner)

VoiceMode uses uvx to run. Install uv which provides it:

brew install uv

Verify: uvx --version

Step 2: Add VoiceMode MCP Server to Claude Code

claude mcp add --scope user voicemode -- uvx --refresh --with webrtcvad --with "setuptools<71" voice-mode

This adds the MCP server config to ~/.claude.json.

Why the extra dependencies? VoiceMode uses webrtcvad for Voice Activity Detection (silence detection that stops recording when you stop talking). The webrtcvad package depends on pkg_resources which was removed in setuptools>=71. Without both packages, VAD fails silently and the mic records for the full duration instead of cutting off when you stop speaking.

If you already ran the basic claude mcp add without these flags, edit ~/.claude.json directly:

"voicemode": {
  "type": "stdio",
  "command": "uvx",
  "args": ["--refresh", "--with", "webrtcvad", "--with", "setuptools<71", "voice-mode"],
  "env": {}
}

Then restart Claude Code.

Step 3: Install Whisper (Speech-to-Text)

uvx voice-mode service install whisper

This will:

  • Clone and build whisper.cpp locally
  • Download the base model (~141MB) with CoreML support
  • Install to ~/.voicemode/services/whisper/

Step 4: Install Kokoro (Text-to-Speech) — Optional

Skip this if you only want speech input (recommended for lowest latency).

uvx voice-mode service install kokoro

This installs the Kokoro neural TTS engine to ~/.voicemode/services/kokoro/.

Step 5: Start Services

uvx voice-mode service start whisper
uvx voice-mode service start kokoro  # skip if you didn't install it

Verify everything is running:

uvx voice-mode status

You should see:

  • Whisper: running / healthy on port 2022
  • Kokoro: running on port 8880 (if installed)

Step 6: Tune Settings for Responsiveness

Edit ~/.voicemode/voicemode.env and uncomment/set these values:

# Make silence detection more aggressive (stops recording faster)
VOICEMODE_VAD_AGGRESSIVENESS=3

# Stop recording after 1.5s of silence
VOICEMODE_SILENCE_THRESHOLD_MS=1500

# Safety cap — never record more than 30s
VOICEMODE_DEFAULT_LISTEN_DURATION=30.0

# RECOMMENDED: Disable TTS for much faster responses (text-only output)
VOICEMODE_SKIP_TTS=true

Step 7: Use It

Start (or restart) Claude Code, then:

❯ voice mode

Claude will call the converse MCP tool and start listening to your microphone.

Troubleshooting

Silence detection not working (mic records forever)

This is the most common issue. The mic records for the full listen_duration_max instead of stopping when you stop talking.

Diagnosis: Check the debug log:

grep VAD_AVAILABLE ~/voicemode_debug.log

If you see VAD_AVAILABLE=False, the webrtcvad library isn't loading.

Fix: Ensure your MCP config in ~/.claude.json includes BOTH dependencies:

"args": ["--refresh", "--with", "webrtcvad", "--with", "setuptools<71", "voice-mode"]

The webrtcvad package requires pkg_resources from setuptools, but setuptools>=71 removed it. You need the older version pinned. Restart Claude Code after changing.

"Voicemode is not available"

Make sure uvx is installed (brew install uv) and restart Claude Code.

Hangs forever while listening

  1. Check no other process is using the mic:

    ps aux | grep -E 'rec |sox ' | grep -v grep

    Kill any stray recording processes: kill -9 <PID>

  2. Increase VAD aggressiveness in ~/.voicemode/voicemode.env:

    VOICEMODE_VAD_AGGRESSIVENESS=3
    

Microphone not working

macOS needs to grant mic access to your terminal app. Go to System Settings > Privacy & Security > Microphone and enable your terminal (Terminal.app, iTerm2, Warp, etc.).

You can test mic access with:

brew install sox
rec /tmp/test.wav trim 0 2

If prompted, click "Allow".

Very slow responses

  • Disable TTS: set VOICEMODE_SKIP_TTS=true in ~/.voicemode/voicemode.env
  • Check Whisper latency directly:
    time curl -s -X POST http://127.0.0.1:2022/v1/audio/transcriptions \
      -F "file=@$(brew --prefix whisper-cpp)/share/whisper-cpp/jfk.wav" \
      -F "model=base"
    Should be under 0.5s on Apple Silicon.

Services not starting

uvx voice-mode service logs whisper
uvx voice-mode service logs kokoro

Re-enable TTS later

Set VOICEMODE_SKIP_TTS=false in ~/.voicemode/voicemode.env. No restart needed — takes effect on next converse call.

Available Voices (when TTS enabled)

uvx voice-mode config get voices

Default voice is af_sky. Kokoro supports many voices — see Kokoro docs.

Quick Reference

Command What it does
uvx voice-mode status Show all service status
uvx voice-mode service start whisper Start STT
uvx voice-mode service start kokoro Start TTS
uvx voice-mode service stop whisper Stop STT
uvx voice-mode service health whisper Check if Whisper is responding
uvx voice-mode service health kokoro Check if Kokoro is responding

Architecture

┌─────────────┐     ┌──────────────────┐     ┌────────────┐     ┌─────────────┐
│ Microphone  │────▶│ Whisper (local)   │────▶│ Claude Code│────▶│ Kokoro TTS  │
│ via VoiceMode│    │ port 2022         │     │ via MCP    │     │ port 8880   │
│ VAD recording│    │ ~0.3s latency     │     │            │     │ (optional)  │
└─────────────┘     └──────────────────┘     └────────────┘     └─────────────┘

Everything runs locally. No audio leaves your machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment