Skip to content

Instantly share code, notes, and snippets.

@mustakimali
Created January 4, 2025 13:30
Show Gist options
  • Save mustakimali/197ad44835447fdd5cfeb83ce7d2c1a2 to your computer and use it in GitHub Desktop.
Save mustakimali/197ad44835447fdd5cfeb83ce7d2c1a2 to your computer and use it in GitHub Desktop.
Transcribe podcasts using OpenAI's free Whisper language model

Podcast transcribers

Using OpenAI's whisper language model.

Usage

Make sure you have docker installed.

./run.sh podcast_file.mp3 --model tiny --language English > transcript.txt

This will transcribe the podcast file podcast_file.mp3 using the tiny model and the English language. The transcript will be saved in transcript.txt.

You can tail (tail -f transcript.txt) the transcript file on another terminal to see the text as it is being generated:

Better accuracy

Use one of the following models based on your requirements as explained in the whisper docs:

Size Parameters English-only model Multilingual model Required VRAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~10x
base 74 M base.en base ~1 GB ~7x
small 244 M small.en small ~2 GB ~4x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x
turbo 809 M N/A turbo ~6 GB ~8x
FROM ubuntu:latest
WORKDIR /app
# source: https://pypi.org/project/openai-whisper/
RUN apt-get update
RUN apt-get install curl git ffmpeg python3-pip python3.12-venv -y
RUN python3 -m venv .venv
RUN .venv/bin/pip install -U openai-whisper
RUN .venv/bin/pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
RUN .venv/bin/pip install setuptools-rust
ENV PATH="/app/.venv/bin:${PATH}"
CMD [ "/bin/bash" ]
#!/bin/bash
set -euo pipefail
if [ "$#" -lt 1 ] || [ ! -f "$1" ]; then
echo "Podcast Transcriber"
echo "Usage: $0 <podcast file> [extra_args...]"
exit 1
fi
file="$1"
shift
extra_args="$@"
docker build -t podcast-transcriber .
mkdir -p data
cp "$file" data/
filename=$(basename "$file")
docker run -it --rm -v "$(pwd)/data:/app/data" podcast-transcriber whisper "data/$filename" $extra_args
rm -rf data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment