Whisper Transcription Steps

These are the steps used to transcribe 27.aac with OpenAI Whisper.

1. Install ffmpeg

Whisper needs ffmpeg to read and decode audio files such as .aac.

brew install ffmpeg

The audio file was checked with:

ffprobe -hide_banner 27.aac

This showed that 27.aac was an AAC audio file with a duration of about 00:38:47.

2. Create a Python virtual environment

Python 3.12 was used because it is compatible with the Whisper package and its dependencies.

python3.12 -m venv .venv-whisper

3. Install OpenAI Whisper

Whisper was installed inside the local virtual environment:

.venv-whisper/bin/python -m pip install -U pip openai-whisper

This installed Whisper and its required Python packages, including PyTorch.

4. Run Whisper on the audio file

The audio was transcribed with the English-specific small.en model:

.venv-whisper/bin/whisper 27.aac \
  --model small.en \
  --language English \
  --task transcribe \
  --output_dir transcript_27 \
  --output_format all \
  --model_dir whisper-models \
  --fp16 False

Notes:

--model small.en was chosen as a good balance of accuracy and speed for English audio with some noise.
--language English forced English transcription instead of language auto-detection.
--fp16 False was used because the transcription ran on CPU.
--model_dir whisper-models stored the downloaded Whisper model locally in this project.
--output_format all generated plain text, subtitles, timestamp data, and JSON output.

5. Check the generated transcript files

The output directory was checked with:

ls -lh transcript_27

Whisper generated these files:

transcript_27/27.txt - plain text transcript
transcript_27/27.srt - subtitle file
transcript_27/27.vtt - web subtitle file
transcript_27/27.tsv - timestamped tab-separated output
transcript_27/27.json - detailed JSON output

The word count was checked with:

wc -w transcript_27/27.txt

The final transcript contained about 4,636 words.

alex-pythonista/whisper_transcription_steps.md

Select an option

No results found