This Python script is designed to process .vtt
subtitle files obtained using yt-dlp
from YouTube or similar platforms. It merges subtitles with overlapping segments and cleans the text by removing excess whitespace. The script outputs the processed subtitles into a new text file with a timestamped filename.
- Subtitle Merging: Combines multiple subtitle entries into a single entry, considering overlaps.
- Text Cleaning: Cleans subtitle text by replacing newline characters and reducing multiple spaces to a single space.
- Output: Generates a cleaned and merged text file for each
.vtt
file in the specified directory.