This gist is a super basic helper script for OpenAI's Whisper model and associated CLI to transcribe a directory of video files to text.
for linux/ubuntu you can run the helper script:
bash linux_setup.sh
For other OS installing ffmpeg might require slightly more work, refer to the official repo for details.
The default model in the .sh
script is the medium.en
model, which is quite large/slow if you do not have a GPU. There are other models that you can replace this with:
- use
base.en
ortiny.en
if transcribing English-to-English text - use
base
orsmall
if doing things that involve other languages (you will need to modifydo whisper "$f" --model $MODEL_ID --output_dir whisper-out
, refer to the OpenAI repo for additional arguments to pass
So to use tiny.en
you would update MODEL_ID="medium.en"
to MODEL_ID="tiny.en"
.
Update the below two vars to values relevant for your use case:
DIR_PATH="/path/to/videos"
MODEL_ID="medium.en"
Run the script:
bash run_whisper.sh
The out directory DIR_PATH/transcribed-audio-whisper-out
will contain several files for each video, including a .txt
that will include the raw transcription.
Some ideas on how to make these even more useful: make PDFs with paragraph segmentation or summarize them