Skip to content

Instantly share code, notes, and snippets.

@pszemraj
Last active May 8, 2023 19:00
Show Gist options
  • Save pszemraj/cf099535d57bcbf41fead3e7128856f7 to your computer and use it in GitHub Desktop.
Save pszemraj/cf099535d57bcbf41fead3e7128856f7 to your computer and use it in GitHub Desktop.
apply whisper to a folder of video files. cuts out silence from videos with FFMPEG and converts to audio, then runs whisper.

whisper audio transcription - batched on directory

This gist is a super basic helper script for OpenAI's Whisper model and associated CLI to transcribe a directory of video files to text.

install

for linux/ubuntu you can run the helper script:

bash linux_setup.sh

For other OS installing ffmpeg might require slightly more work, refer to the official repo for details.

options

The default model in the .sh script is the medium.en model, which is quite large/slow if you do not have a GPU. There are other models that you can replace this with:

  • use base.en or tiny.en if transcribing English-to-English text
  • use base or small if doing things that involve other languages (you will need to modify do whisper "$f" --model $MODEL_ID --output_dir whisper-out, refer to the OpenAI repo for additional arguments to pass

So to use tiny.en you would update MODEL_ID="medium.en" to MODEL_ID="tiny.en".

Running the script

Update the below two vars to values relevant for your use case:

DIR_PATH="/path/to/videos"
MODEL_ID="medium.en"

Run the script:

bash run_whisper.sh

Results

The out directory DIR_PATH/transcribed-audio-whisper-out will contain several files for each video, including a .txt that will include the raw transcription.

Some ideas on how to make these even more useful: make PDFs with paragraph segmentation or summarize them

# THIS IS JUST A HELPER SCRIPT. If you have whisper installed already disregard.
# see https://github.com/openai/whisper for all questions and details (and installation on other systems)
# CREATE A VIRTUAL ENV FIRST
# ffmpeg
sudo apt-get install ffmpeg -y
# standard pip upgrade & install
pip install -U pip
pip install setuptools-rust
pip install git+https://github.com/openai/whisper.git
DIR_PATH="/path/to/videos"
MODEL_ID="medium.en"
cd "$DIR_PATH"
mkdir transcribed-audio
for f in *.mp4; do ffmpeg -i "$f" -c:a libmp3lame -af silenceremove=stop_periods=-1:stop_duration=3:stop_threshold=-50dB "transcribed-audio/${f%.mp4}.mp3"; done
cd transcribed-audio
mkdir whisper-out -p
for f in *.mp3; do whisper "$f" --model $MODEL_ID --output_dir whisper-out; done
pwd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment