Created
September 17, 2025 14:00
-
-
Save kanzure/593da089044814bb9c189d3c369e44d4 to your computer and use it in GitHub Desktop.
bash script to transcribe a youtube video using speech-to-text tool Whisper on Groq and kimi-k2 for postprocessing.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| #!/bin/sh | |
| # originally encouraged by L29ah https://gnusha.org/logs/2025-09-17.log | |
| # parameter is the url of the youtube video | |
| url="$1" | |
| temp=$(mktemp -d) | |
| echo "Made temporary directory $temp" | |
| cd "$temp" | |
| echo "Downloading youtube video $url" | |
| yt-dlp --extract-audio --audio-format wav -o output.mp4 "$url" | |
| # convert output.wav into 16KHz mono because Groq Whisper API converts to 16 KHz mono anyway | |
| #ffmpeg -i output.wav -ar 16000 -ac 1 output_16k_mono.wav | |
| #voice output_16k_mono.wav > transcript.txt | |
| # https://gist.github.com/kanzure/27d165dcaba026600304355e02d46e97 | |
| voice output.mp4 > transcript.txt | |
| echo "--- START RAW TRANSCRIPT ---" | |
| cat transcript.txt | |
| echo "--- END RAW TRANSCRIPT ---" | |
| echo "--- START MODIFIED TRANSCRIPT ---" | |
| # uv tool install llm; lm install llm-groq; llm keys set groq | |
| cat transcript.txt | llm -m groq/moonshotai/kimi-k2-instruct-0905 --system "Never mention encountered paid commerials, promo codes, viewer engagement stuff and so on. Summarize the provided video transcript in a few sentences. Afterwards, format the provided video transcript as a well structured text. At the end of the transcript, please identify any of the difficulties that you encountered, including words that are probably incorrect or abbreviations or your best guesses as to what might have been intended in the transcript because this was originally done by speech-to-text." - | |
| echo "--- END MODIFIED TRANSCRIPT ---" | |
| # cleanup | |
| echo "Not cleaning up $temp" | |
| #rm -rf "$temp" |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment