Using OpenAI's whisper language model.
Make sure you have docker installed.
./run.sh podcast_file.mp3 --model tiny --language English > transcript.txt
This will transcribe the podcast file podcast_file.mp3
using the tiny
model and the English
language. The transcript will be saved in transcript.txt
.
You can tail (tail -f transcript.txt
) the transcript file on another terminal to see the text as it is being generated:
Use one of the following models based on your requirements as explained in the whisper docs:
Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
---|---|---|---|---|---|
tiny | 39 M | tiny.en | tiny | ~1 GB | ~10x |
base | 74 M | base.en | base | ~1 GB | ~7x |
small | 244 M | small.en | small | ~2 GB | ~4x |
medium | 769 M | medium.en | medium | ~5 GB | ~2x |
large | 1550 M | N/A | large | ~10 GB | 1x |
turbo | 809 M | N/A | turbo | ~6 GB | ~8x |