This is a simple script for converting TTML subtitle files to SRT ones. Tested with TTML files on tv.nrk.no.
It assumes the data is structured like this:
<tt>
<body>
<div>
<p>(...)</p>
<p>(...)</p>
</div>
</body>
</tt>
Paragraphs might contain <span>
elements that are only used for styling (only italic as I've seen really). As far as I know subrip doesn't support aligning the text (center, left, right), so no support for that as of now.
The script also assumes the TTML uses the begin
and dur
attributes, and not the end
attribute, so the time codes are calculated in a function. Adding support for end
attribute should be easy though.
ttml2srt.php <infile>
Outputs SRT data. Pipe to file if saving needed.
Inspired by: https://gist.github.com/jareware/7af17f2034931608e842 but gave up when libxmljs didn't install and everything went to shit with Node.
mkdir -p ~/.config/youtube-dl
nano ~/.config/youtube-dl/config
Add the following to the file:
--exec 'title=`echo {} | sed "s/\.[a-z0-9]\+$//"`; test -f "${title}"*.ttml && /path/to/ttml2srt.php "${title}"*.ttml > "${title}".srt;'
Replace /path/to/ttml2srt.php
with the actual path to the file, and make sure it's executable!
Everytime you run youtube-dl now, it will check if a .ttml file with the same title exists in the folder and run the ttml2srt.php script on it. If you download subtitles with --all-subs
and more than one is present, the above will likely only convert the last subtitle.
sudo apt-get install php-{cli,xml}
Made by me.
Glad to hear it!
Interesting. So the
&
was not encoded as&
in the TTML file in the first place? Usually XML files wants&
s to be&
if I'm not entirely wrong.Do you have an example TTML file you could link to or attach here where this is the case?