-
-
Save davidcortesortuno/64723e4262889f592def55c1927db651 to your computer and use it in GitHub Desktop.
# Remove duplicated lines from a .vtt file generated by youtube-dl when | |
# downloading auto-subs from a Youtube video using the --write-auto-sub option | |
# This script only prints the lines so save the edited subs as: | |
# | |
# python this_script.py original_sub.vtt > new_sub.vtt | |
import re | |
import sys | |
f = open(sys.argv[1]) | |
patt = re.compile(r'^\d\d:\d\d:\d\d', re.M) | |
dup_line = '' | |
for line in f: | |
# line = f.readline() | |
# Find a line starting with a time stamp: 00:13:23 ... | |
res = re.findall(patt, line) | |
if res: | |
# If so, print this line and read the next line which we save to | |
# store the result in dup_line. | |
# In the next loop, If we find another sections starting with a timestamp, | |
# the dup_line will be matched with the line below. If True, just pass | |
# and do not print the duplicated line | |
# Else, read another pattern to match a duplicated line | |
print(line, end='') | |
next_line = f.readline() | |
if dup_line and next_line == dup_line: | |
dup_line = '' | |
res = [] | |
continue | |
else: | |
dup_line = next_line | |
print(dup_line, end='') | |
res = [] | |
else: | |
print(line, end='') | |
f.close() |
is nto working for me i have this....
03:15.654 --> 03:20.325
granting Kyonan University,
a mere private university,
03:15.654 --> 03:20.325
granting Kyonan University,
a mere private university,
it does not remove duplicate.....
It is not working for me either.
1021
00:20:18,630 --> 00:20:20,540
aquí desde medellín me despido nos vemos
1022
00:20:20,540 --> 00:20:20,550
aquí desde medellín me despido nos vemos
i am interested in solving this issue but i need your help. Could u please share with me the urls of the videos of which vtt file have duplicated lines?
Yes, please share a video that can be tested to update the code :D
This is the video that generated dup lines for me:
https://www.youtube.com/watch?v=ubOqOCukR40&t=941s&ab_channel=GabrielHerrera
Where you able to find a fix? This is becoming a problem for us. Attached is a .vtt file from YouTube that needs duplicates fixed: https://drive.google.com/file/d/163Y-rg2qouJOQ2rjeudQ3dAFrE7TF_2M/view?usp=sharing
Hello, is it oke?
Yes, as we are already iterating through every line:
for line in f