This is a simple script for converting TTML subtitle files to SRT ones. Tested with TTML files on tv.nrk.no.
It assumes the data is structured like this:
<tt>
<body>
<div>
<p>(...)</p>
<p>(...)</p>
</div>
</body>
</tt>
Paragraphs might contain <span>
elements that are only used for styling (only italic as I've seen really). As far as I know subrip doesn't support aligning the text (center, left, right), so no support for that as of now.
The script also assumes the TTML uses the begin
and dur
attributes, and not the end
attribute, so the time codes are calculated in a function. Adding support for end
attribute should be easy though.
ttml2srt.php <infile>
Outputs SRT data. Pipe to file if saving needed.
Inspired by: https://gist.github.com/jareware/7af17f2034931608e842 but gave up when libxmljs didn't install and everything went to shit with Node.
mkdir -p ~/.config/youtube-dl
nano ~/.config/youtube-dl/config
Add the following to the file:
--exec 'title=`echo {} | sed "s/\.[a-z0-9]\+$//"`; test -f "${title}"*.ttml && /path/to/ttml2srt.php "${title}"*.ttml > "${title}".srt;'
Replace /path/to/ttml2srt.php
with the actual path to the file, and make sure it's executable!
Everytime you run youtube-dl now, it will check if a .ttml file with the same title exists in the folder and run the ttml2srt.php script on it. If you download subtitles with --all-subs
and more than one is present, the above will likely only convert the last subtitle.
sudo apt-get install php-{cli,xml}
Made by me.
Thanks for the script, saved me a lot of work since I have some ttml files that use the the 'begin" and 'end' attributes.
Found an issue when converting a subtitle that contained an & character, like this:
<p region="pop317" begin="00:07:40.133" end="00:07:45.800">BACK AT MASELLI & SONS.</span></p>
I solved it by escaping the & like this
It produces the expected output
318 00:07:40,133 --> 00:07:45,800 BACK AT MASELLI & SONS.