Skip to content

Instantly share code, notes, and snippets.

@kanzure
Created September 17, 2025 14:00
Show Gist options
  • Select an option

  • Save kanzure/593da089044814bb9c189d3c369e44d4 to your computer and use it in GitHub Desktop.

Select an option

Save kanzure/593da089044814bb9c189d3c369e44d4 to your computer and use it in GitHub Desktop.
bash script to transcribe a youtube video using speech-to-text tool Whisper on Groq and kimi-k2 for postprocessing.
#!/bin/sh
# originally encouraged by L29ah https://gnusha.org/logs/2025-09-17.log
# parameter is the url of the youtube video
url="$1"
temp=$(mktemp -d)
echo "Made temporary directory $temp"
cd "$temp"
echo "Downloading youtube video $url"
yt-dlp --extract-audio --audio-format wav -o output.mp4 "$url"
# convert output.wav into 16KHz mono because Groq Whisper API converts to 16 KHz mono anyway
#ffmpeg -i output.wav -ar 16000 -ac 1 output_16k_mono.wav
#voice output_16k_mono.wav > transcript.txt
# https://gist.github.com/kanzure/27d165dcaba026600304355e02d46e97
voice output.mp4 > transcript.txt
echo "--- START RAW TRANSCRIPT ---"
cat transcript.txt
echo "--- END RAW TRANSCRIPT ---"
echo "--- START MODIFIED TRANSCRIPT ---"
# uv tool install llm; lm install llm-groq; llm keys set groq
cat transcript.txt | llm -m groq/moonshotai/kimi-k2-instruct-0905 --system "Never mention encountered paid commerials, promo codes, viewer engagement stuff and so on. Summarize the provided video transcript in a few sentences. Afterwards, format the provided video transcript as a well structured text. At the end of the transcript, please identify any of the difficulties that you encountered, including words that are probably incorrect or abbreviations or your best guesses as to what might have been intended in the transcript because this was originally done by speech-to-text." -
echo "--- END MODIFIED TRANSCRIPT ---"
# cleanup
echo "Not cleaning up $temp"
#rm -rf "$temp"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment