This project extracts audio from a video file and transcribes it using the OpenAI Whisper API, and uses a prompt with another model of your choice
- Python 3.10+ (tested with Python 3.13)
- pip (Python package installer)
- An OpenAI API Key with access to audio transcription
-
Create a Virtual Environment
It’s recommended to use a virtual environment:
python3 -m venv env source env/bin/activate -
Install Dependencies
pip install requests pip install openai -
Set Up Environment Variables
The script requires an OpenAI API key for transcription. Set the
OPENAI_API_KEYenvironment variable:- On macOS/Linux:
export OPENAI_API_KEY=your_openai_api_key - Alternatively, create a
.envfile in the project root containing:
and use a package like python-dotenv to load it in your script.OPENAI_API_KEY=your_openai_api_key
- On macOS/Linux:
-
Extract audio and transcribe:
Run the transcriber script:
python transcriber.py INPUT_FILE --prompt "Please summarize this" -
Process Overview
- The script extracts the audio from the video and saves it as
temp_audio.mp3. - It sends the extracted audio file to the OpenAI Whisper API for transcription.
- It takes the output of the transcription into another LLM Prompt of your choice, using
gpt-4o-minias the default model - If the API request fails with a
401error, check your API key and ensure it is set correctly.
- The script extracts the audio from the video and saves it as