This project extracts audio from a video file and transcribes it using the OpenAI Whisper API, and uses a prompt with another model of your choice
- Python 3.10+ (tested with Python 3.13)
- pip (Python package installer)
- An OpenAI API Key with access to audio transcription
-
Create a Virtual Environment
It’s recommended to use a virtual environment:
python3 -m venv env source env/bin/activate
-
Install Dependencies
pip install requests pip install openai
-
Set Up Environment Variables
The script requires an OpenAI API key for transcription. Set the
OPENAI_API_KEY
environment variable:- On macOS/Linux:
export OPENAI_API_KEY=your_openai_api_key
- Alternatively, create a
.env
file in the project root containing:
and use a package like python-dotenv to load it in your script.OPENAI_API_KEY=your_openai_api_key
- On macOS/Linux:
-
Extract audio and transcribe:
Run the transcriber script:
python transcriber.py INPUT_FILE --prompt "Please summarize this"
-
Process Overview
- The script extracts the audio from the video and saves it as
temp_audio.mp3
. - It sends the extracted audio file to the OpenAI Whisper API for transcription.
- It takes the output of the transcription into another LLM Prompt of your choice, using
gpt-4o-mini
as the default model - If the API request fails with a
401
error, check your API key and ensure it is set correctly.
- The script extracts the audio from the video and saves it as