Skip to content

Instantly share code, notes, and snippets.

@jmandel
Created March 18, 2026 15:21
Show Gist options
  • Select an option

  • Save jmandel/834a8cdb0c84fdc778b8c4cece033c47 to your computer and use it in GitHub Desktop.

Select an option

Save jmandel/834a8cdb0c84fdc778b8c4cece033c47 to your computer and use it in GitHub Desktop.
useDictation — React hook for browser speech-to-text that behaves like typing

useDictation

A React hook for browser speech-to-text that behaves like typing.

The problem

The Web Speech API (SpeechRecognition) gives you a stream of results, but wiring it into a text input is surprisingly fiddly:

  • Interim results need to stream in so the user sees text appear as they speak — but they also need to be replaced when the recognizer refines them or finalizes them.
  • Manual edits must be respected. If the user deletes text or types something, speech results shouldn't overwrite their changes.
  • Browsers kill recognition silently. Chrome stops listening after ~60s of silence. The user still sees the mic "on" but nothing happens. You need auto-restart.
  • Session restarts must not replay old text. Each new SpeechRecognition instance starts with a fresh result list. If you're tracking the wrong state, old text reappears or current text vanishes.

How it works

The hook tracks one number: interimLenRef — how many characters at the end of the text string are interim (not yet finalized).

"I was saying that maybe we should "
                                     ← interimLenRef = 0 (no interim)

"I was saying that maybe we should consider the"
                                    ^^^^^^^^^^^^^^
                                    interimLenRef = 14 (interim suffix)

"I was saying that maybe we should consider the options"
                                    ^^^^^^^^^^^^^^^^^^^^
                                    interimLenRef = 20 (interim updated)

"I was saying that maybe we should consider the options "
                                                         ← interim finalized,
                                                           interimLenRef = 0

When a new speech result arrives, applyUpdate slices off the old interim suffix and appends the new final + interim text:

const base = prev.slice(0, prev.length - interimLenRef.current);
interimLenRef.current = interim.length;
return base + final + interim;

When the user manually edits (types, deletes, pastes), onManualEdit() sets interimLenRef to 0. The next speech result just appends — it never touches what the user edited.

Auto-restart

Chrome/WebKit will silently end recognition after a silence timeout. The hook detects this via onend and restarts automatically if the user hasn't toggled off. Each new recognition session tracks its own processedCount so it only emits its own results.

Usage

import { useState, useCallback } from 'react';
import { useDictation } from './use-dictation';

function ChatInput() {
  const [text, setText] = useState('');
  const { recording, toggle, applyUpdate, onManualEdit, supported } = useDictation();

  const handleUpdate = useCallback((final: string, interim: string) => {
    setText(prev => applyUpdate(prev, final, interim));
  }, [applyUpdate]);

  return (
    <div>
      <textarea
        value={text}
        onChange={e => { onManualEdit(); setText(e.target.value); }}
      />
      {supported && (
        <button onClick={() => toggle(handleUpdate)}>
          {recording ? 'Stop' : 'Mic'}
        </button>
      )}
    </div>
  );
}

Options

const { ... } = useDictation({ lang: 'es-ES' }); // default: 'en-US'

API

Return value Type Description
recording boolean Whether recognition is active
toggle(onUpdate) (cb) => void Start/stop. Pass your update callback
applyUpdate(prev, final, interim) string Pure string transform — merges speech into text
onManualEdit() () => void Call from onChange to reset interim tracking
supported boolean Whether SpeechRecognition exists in this browser

Design decisions

Why not own the text state? The hook could manage text internally and return [text, setText]. But that makes it hard to compose with other things that touch the same input — paste handlers, autocomplete, form libraries. Instead, applyUpdate is a pure string transform. You own the state; the hook just knows how to merge speech into it.

Why not take a ref? Coupling to a DOM element would limit where you can use it (what about contenteditable? Zustand? A non-React renderer?). The suffix-tracking approach is DOM-independent.

Why no cursor-position insertion? Tracking cursor position across speech results, user edits, and React re-renders is complex and fragile. The suffix approach covers the common case (chat input, search box, any append-oriented input) without that complexity. If you need mid-text insertion, you'd want a different approach entirely (probably execCommand or InputEvent-based).

Why toggle(onUpdate) instead of a stable callback? The update handler often closes over setText, which may change. Passing it at toggle-time means you don't need to worry about stale closures — the latest handler is captured in a ref internally.

Platform notes

  • Desktop Chrome (Mac/Windows/Linux): Works well. Uses Google's cloud speech service.
  • iOS (any browser): All iOS browsers use WebKit. continuous mode is less reliable — recognition stops more aggressively on silence. The auto-restart logic handles this, but users may notice brief gaps.
  • Firefox: SpeechRecognition is not supported. supported will be false.
  • Safari (macOS): Supported since Safari 14.1. Uses Apple's on-device recognition.

License

MIT

import { useRef, useState, useCallback } from 'react';
/**
* useDictation — a React hook for browser speech-to-text that behaves like typing.
*
* Speech text streams into your textarea/input as the user speaks. Interim
* (in-progress) text appears immediately and gets replaced as recognition
* refines it. Finalized text becomes permanent. Manual edits are never
* clobbered.
*
* Usage:
* const { recording, toggle, applyUpdate, onManualEdit, supported } = useDictation();
* const [text, setText] = useState('');
*
* const handleUpdate = useCallback((final: string, interim: string) => {
* setText(prev => applyUpdate(prev, final, interim));
* }, [applyUpdate]);
*
* <textarea
* value={text}
* onChange={e => { onManualEdit(); setText(e.target.value); }}
* />
* {supported && (
* <button onClick={() => toggle(handleUpdate)}>
* {recording ? 'Stop' : 'Mic'}
* </button>
* )}
*/
interface UseDictationOptions {
/** BCP 47 language tag (default: 'en-US') */
lang?: string;
}
interface UseDictationReturn {
/** Whether recognition is currently active */
recording: boolean;
/** Toggle recording on/off. Pass your update handler. */
toggle: (onUpdate: (final: string, interim: string) => void) => void;
/** Pure string transform: merges final + interim text into previous string */
applyUpdate: (prev: string, final: string, interim: string) => string;
/** Call this from your input's onChange to reset interim tracking */
onManualEdit: () => void;
/** Whether SpeechRecognition is available in this browser */
supported: boolean;
}
export function useDictation(options?: UseDictationOptions): UseDictationReturn {
const lang = options?.lang ?? 'en-US';
const [recording, setRecording] = useState(false);
const recRef = useRef<any>(null);
const wantRef = useRef(false);
const interimLenRef = useRef(0);
const onUpdateRef = useRef<((f: string, i: string) => void) | null>(null);
const SR = typeof window !== 'undefined'
? (window as any).SpeechRecognition || (window as any).webkitSpeechRecognition
: null;
const startRec = useCallback(() => {
if (!SR) return;
const rec = new SR();
rec.continuous = true;
rec.interimResults = true;
rec.lang = lang;
let processedCount = 0;
rec.onresult = (e: any) => {
let newFinal = '';
let interim = '';
for (let i = processedCount; i < e.results.length; i++) {
if (e.results[i].isFinal) {
newFinal += e.results[i][0].transcript;
processedCount = i + 1;
} else {
interim += e.results[i][0].transcript;
}
}
onUpdateRef.current?.(newFinal, interim);
};
rec.onend = () => {
recRef.current = null;
if (wantRef.current) {
// Browser killed recognition (silence timeout, etc.) — auto-restart
try {
startRec();
} catch {
wantRef.current = false;
setRecording(false);
}
} else {
setRecording(false);
}
};
rec.onerror = (e: any) => {
if (e.error === 'not-allowed' || e.error === 'service-not-allowed') {
wantRef.current = false;
setRecording(false);
recRef.current = null;
}
// Other errors (network, no-speech) — let onend handle restart
};
rec.start();
recRef.current = rec;
}, [SR, lang]);
const toggle = useCallback((onUpdate: (final: string, interim: string) => void) => {
if (recording) {
wantRef.current = false;
recRef.current?.stop();
recRef.current = null;
setRecording(false);
} else {
onUpdateRef.current = onUpdate;
wantRef.current = true;
startRec();
setRecording(true);
}
}, [recording, startRec]);
const applyUpdate = useCallback((prev: string, final: string, interim: string): string => {
const base = prev.slice(0, prev.length - interimLenRef.current);
interimLenRef.current = interim.length;
return base + final + interim;
}, []);
const onManualEdit = useCallback(() => {
interimLenRef.current = 0;
}, []);
return { recording, toggle, applyUpdate, onManualEdit, supported: !!SR };
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment