Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save alex-pythonista/c163de8ade71f7a78caf486ac42f527b to your computer and use it in GitHub Desktop.

Select an option

Save alex-pythonista/c163de8ade71f7a78caf486ac42f527b to your computer and use it in GitHub Desktop.
How to transcribe AAC audio with OpenAI Whisper

Whisper Transcription Steps

These are the steps used to transcribe 27.aac with OpenAI Whisper.

1. Install ffmpeg

Whisper needs ffmpeg to read and decode audio files such as .aac.

brew install ffmpeg

The audio file was checked with:

ffprobe -hide_banner 27.aac

This showed that 27.aac was an AAC audio file with a duration of about 00:38:47.

2. Create a Python virtual environment

Python 3.12 was used because it is compatible with the Whisper package and its dependencies.

python3.12 -m venv .venv-whisper

3. Install OpenAI Whisper

Whisper was installed inside the local virtual environment:

.venv-whisper/bin/python -m pip install -U pip openai-whisper

This installed Whisper and its required Python packages, including PyTorch.

4. Run Whisper on the audio file

The audio was transcribed with the English-specific small.en model:

.venv-whisper/bin/whisper 27.aac \
  --model small.en \
  --language English \
  --task transcribe \
  --output_dir transcript_27 \
  --output_format all \
  --model_dir whisper-models \
  --fp16 False

Notes:

  • --model small.en was chosen as a good balance of accuracy and speed for English audio with some noise.
  • --language English forced English transcription instead of language auto-detection.
  • --fp16 False was used because the transcription ran on CPU.
  • --model_dir whisper-models stored the downloaded Whisper model locally in this project.
  • --output_format all generated plain text, subtitles, timestamp data, and JSON output.

5. Check the generated transcript files

The output directory was checked with:

ls -lh transcript_27

Whisper generated these files:

  • transcript_27/27.txt - plain text transcript
  • transcript_27/27.srt - subtitle file
  • transcript_27/27.vtt - web subtitle file
  • transcript_27/27.tsv - timestamped tab-separated output
  • transcript_27/27.json - detailed JSON output

The word count was checked with:

wc -w transcript_27/27.txt

The final transcript contained about 4,636 words.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment