These are the steps used to transcribe 27.aac with OpenAI Whisper.
Whisper needs ffmpeg to read and decode audio files such as .aac.
brew install ffmpegThe audio file was checked with:
ffprobe -hide_banner 27.aacThis showed that 27.aac was an AAC audio file with a duration of about 00:38:47.
Python 3.12 was used because it is compatible with the Whisper package and its dependencies.
python3.12 -m venv .venv-whisperWhisper was installed inside the local virtual environment:
.venv-whisper/bin/python -m pip install -U pip openai-whisperThis installed Whisper and its required Python packages, including PyTorch.
The audio was transcribed with the English-specific small.en model:
.venv-whisper/bin/whisper 27.aac \
--model small.en \
--language English \
--task transcribe \
--output_dir transcript_27 \
--output_format all \
--model_dir whisper-models \
--fp16 FalseNotes:
--model small.enwas chosen as a good balance of accuracy and speed for English audio with some noise.--language Englishforced English transcription instead of language auto-detection.--fp16 Falsewas used because the transcription ran on CPU.--model_dir whisper-modelsstored the downloaded Whisper model locally in this project.--output_format allgenerated plain text, subtitles, timestamp data, and JSON output.
The output directory was checked with:
ls -lh transcript_27Whisper generated these files:
transcript_27/27.txt- plain text transcripttranscript_27/27.srt- subtitle filetranscript_27/27.vtt- web subtitle filetranscript_27/27.tsv- timestamped tab-separated outputtranscript_27/27.json- detailed JSON output
The word count was checked with:
wc -w transcript_27/27.txtThe final transcript contained about 4,636 words.