Skip to content

Instantly share code, notes, and snippets.

@arthursoares
Created February 21, 2025 16:00
Show Gist options
  • Save arthursoares/91ebc0deb3ea49b1b21091c5171b0e7d to your computer and use it in GitHub Desktop.
Save arthursoares/91ebc0deb3ea49b1b21091c5171b0e7d to your computer and use it in GitHub Desktop.
Use ffmpeg, Whisper and other OpenAI LLM to transcribe and summarize a video

Video Transcriber

This project extracts audio from a video file and transcribes it using the OpenAI Whisper API, and uses a prompt with another model of your choice

Prerequisites

  • Python 3.10+ (tested with Python 3.13)
  • pip (Python package installer)
  • An OpenAI API Key with access to audio transcription

Setup

  1. Create a Virtual Environment

    It’s recommended to use a virtual environment:

    python3 -m venv env
    source env/bin/activate
    
  2. Install Dependencies

    pip install requests
    pip install openai
    
  3. Set Up Environment Variables

    The script requires an OpenAI API key for transcription. Set the OPENAI_API_KEY environment variable:

    • On macOS/Linux:
      export OPENAI_API_KEY=your_openai_api_key
      
    • Alternatively, create a .env file in the project root containing:
      OPENAI_API_KEY=your_openai_api_key
      
      and use a package like python-dotenv to load it in your script.

Usage

  1. Extract audio and transcribe:

    Run the transcriber script:

    python transcriber.py INPUT_FILE --prompt "Please summarize this"
    
  2. Process Overview

    • The script extracts the audio from the video and saves it as temp_audio.mp3.
    • It sends the extracted audio file to the OpenAI Whisper API for transcription.
    • It takes the output of the transcription into another LLM Prompt of your choice, using gpt-4o-mini as the default model
    • If the API request fails with a 401 error, check your API key and ensure it is set correctly.
#!/usr/bin/env python3
import os
import subprocess
import argparse
import openai
import requests
openai.api_key = os.getenv("OPENAI_API_KEY")
def extract_audio_from_video(video_path, audio_path):
command = [
"ffmpeg", "-i", video_path,
"-vn", # No video
"-acodec", "mp3",
audio_path,
"-y"
]
subprocess.run(command, check=True)
def transcribe_audio_whisper(audio_path):
"""
Transcribe an audio file using OpenAI's Whisper model.
Uses direct HTTP request call to OpenAI's transcription endpoint.
"""
url = "https://api.openai.com/v1/audio/transcriptions"
headers = {
"Authorization": f"Bearer {openai.api_key}"
}
data = {
"model": "whisper-1"
}
with open(audio_path, "rb") as audio_file:
print("[INFO] Transcribing audio with Whisper...")
files = {
"file": audio_file
}
response = requests.post(url, headers=headers, data=data, files=files)
response.raise_for_status()
result = response.json()
return result["text"]
def summarize_transcript(transcript, user_prompt):
"""
Summarize the transcript via OpenAI's Chat Completion API using a direct HTTP request.
"""
url = "https://api.openai.com/v1/chat/completions"
headers = {
"Authorization": f"Bearer {openai.api_key}",
"Content-Type": "application/json"
}
system_content = "You are a helpful assistant..."
user_content = (
f"The transcript of the video is:\n\n{transcript}\n\n"
f"User prompt: {user_prompt}\n\n"
"Please summarize or analyze the transcript."
)
payload = {
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": system_content},
{"role": "user", "content": user_content},
],
"temperature": 0.7
}
response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
result = response.json()
return result["choices"][0]["message"]["content"]
def main():
parser = argparse.ArgumentParser(description="Video Summarizer with OpenAI Whisper + GPT")
parser.add_argument("video_file", help="Path to the local video file")
parser.add_argument("--prompt", type=str, default="Please summarize...",
help="User-defined prompt to guide the summary.")
args = parser.parse_args()
video_file = args.video_file
user_prompt = args.prompt
print(f"[INFO] Extracting audio from {video_file}...")
audio_file = "temp_audio.mp3"
extract_audio_from_video(video_file, audio_file)
print(f"[INFO] Audio extracted to {audio_file}")
transcription_text = transcribe_audio_whisper(audio_file)
print("[INFO] Transcription complete.")
summary = summarize_transcript(transcription_text, user_prompt)
print("[INFO] Summary/Analysis complete.\n")
print("TRANSCRIPTION:\n", transcription_text)
print("\n----------------------------------------\n")
print("SUMMARY / ANALYSIS:\n", summary)
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment