Last active
December 17, 2023 09:49
-
-
Save markshust/e9d772664492c5cb76a6fde032abc090 to your computer and use it in GitHub Desktop.
Python script to use Whisper to create srt's for all mp4's that don't currently have one in the current directory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import os | |
import whisper | |
from whisper.utils import get_writer | |
# Get the current directory path | |
directory = os.getcwd() | |
# Loop through all the files in the directory | |
for file in sorted(os.listdir(directory)): | |
# Check if the file has the mp4 extension | |
if file.endswith('.mp4'): | |
# Get the name of the file with the extension | |
name = os.path.splitext(file)[0] + '.mp4' | |
srt_file = directory + '/' + name + '.srt' | |
# Check if there is a related srt file | |
if not os.path.isfile(srt_file): | |
# Create srt for all mp4 files that need one | |
print('Processing ' + file + '...') | |
model = whisper.load_model('large') | |
result = model.transcribe(file, fp16=False) | |
srt_writer = get_writer('srt', './') | |
srt_writer(result, file) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
openai-whisper |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi, I tried to run your script and got an "out of memory" error. I think it is due to the fact that the line "model = whisper.load_model('large')" is part of the loop (and so the program attempts to load the model several times). I fixed it by moving the line outside the loop, as follows: