Skip to content

Instantly share code, notes, and snippets.

@debugdynamocoder
Last active March 4, 2026 17:09
Show Gist options
  • Select an option

  • Save debugdynamocoder/af35804fa3f152d76214443f71a8e6f2 to your computer and use it in GitHub Desktop.

Select an option

Save debugdynamocoder/af35804fa3f152d76214443f71a8e6f2 to your computer and use it in GitHub Desktop.
My Script for STT

Script Purpose:

This script transcribes audio into text using either a remote Groq API or a local Whisper.cpp model based on user input. Triggered via keybindings, it supports multiple languages.

Keybindings:

  • whpr → Recognize audio in English.
  • whpr fr → Recognize audio in French.
  • whpr es → Recognize audio in Spanish.
  • whpr lang → Recognize audio in any supported language.

Workflow:

  1. Recording:

    • Starts recording audio using rec and saves the file to /dev/shm for faster RAM-based processing.
    • Stops recording when triggered again.
  2. Processing:

    • If using Groq API (default), it sends the audio for transcription. Falls back once if the request fails.
    • If using Whisper.cpp locally, it processes the audio using the specified model.
  3. Output:

    • The transcribed text is passed to the keyboard using xdotool.
    • Notifications (notify-send) provide status updates.

Configuration:

  • Groq API KEY: Place it at ~/groq.token.txt.
  • Local Setup:
    • Set REMOTE=false to use Whisper.cpp.
    • Install Whisper.cpp and download the models:
      • Multilingual: /home/user/tmp/whisper.cpp/models/ggml-small.bin
      • English-only: /home/user/tmp/whisper.cpp/models/ggml-small.en.bin
    • Adjust paths as needed.

Dependencies:

  • rec, xdotool, notify-send (common in Ubuntu/Debian-based systems).
  • Groq API or Whisper.cpp for transcription.

Notes:

  • Tested on Xubuntu; It should work almost out-of-the-box with ubuntu based and debian based distros, compatibility may vary on other distributions.
  • Ensure required tools are installed for xdotool (keyboard input) and notify-send (notifications).
#!/bin/bash
ramf="/dev/shm/sfile.wav"
language="en"
# this is not taken into account for X11
GROQ_API_KEY=`cat ~/groq.token.txt`
REMOTE=true
# Check if a language parameter is provided
if [ ! -z "$1" ]; then
language="$1"
fi
# Check if the process is running
if ps aux | grep -v grep | grep "$ramf" > /dev/null; then
time=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$ramf" | awk '{printf "%.1f\n", $1}')
notify-send "Whisper process" "Processing $time seconds"
PID=$(ps aux | grep "$ramf" | grep -v grep | awk '{print $2}')
kill $PID
start_time=$(date +%s)
if [ "$REMOTE" = true ]; then
sleep 0.05
str=$(curl https://api.groq.com/openai/v1/audio/transcriptions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@$ramf" \
-F language="$language" \
-F model="whisper-large-v3" | jq -r .text)
if [ "$str" = "null" ] || [ -z "$str" ]; then
str=$(curl https://api.groq.com/openai/v1/audio/transcriptions \
-H "Authorization: Bearer $GROQ_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@$ramf" \
-F language="$language" \
-F model="whisper-large-v3" | jq -r .text)
fi
else
if [ "$language" != "en" ]; then
str="$(/home/user/tmp/whisper.cpp/build/bin/whisper-cli -m /home/user/tmp/whisper.cpp/models/ggml-small.bin -l "$language" -nt -fa -t 6 -f $ramf 2>/dev/null)"
else
str="$(/home/user/tmp/whisper.cpp/build/bin/whisper-cli -m /home/user/tmp/whisper.cpp/models/ggml-small.en.bin -nt -fa -t 6 -f $ramf 2>/dev/null)"
fi
notify-send "Processed in $lapse_time seconds" ""
fi
end_time=$(date +%s)
lapse_time=$((end_time - start_time))
str2=$(echo "$str" | tr -d '\n' | sed 's/^[ \t]*//;s/[ \t]*$//')
echo -n "$str2" | xdotool type --clearmodifiers --delay 0 --file -
rm "$ramf"
else
notify-send "Starting Record ($language)" ""
# Start recording
nohup rec -q -t wav $ramf rate 16k channels 1 2>/dev/null &
fi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment