I've installed aeneas and I want to split text. Here's what I did. Taking the raw transcript from my captioner, I convert to plain text (on Mac OSX using textutil
):
textutil 2021-01-28\ Machine\ Learning.rtf -convert txt
This gives me 2021-01-28\ Machine\ Learning.txt
which I then process with split.py
above (requires nltk):
python split.py
which gives me new_parsed.txt
. Why split/fragment the text? aeneas
in particular assumes the text you're working with is already fragmented... so we need to do that ourselves (see split.py
). I was inspired by readbeyond/aeneas#242.
I can then run aeneas
using this to get an srt
file I can encode with handbrake:
$ python -m aeneas.tools.execute_task zoom_0.mp4 new_parsed.txt "task_language=eng|os_task_file_format=srt|is_text_type=plain" map.srt
[INFO] Validating config string (specify --skip-validator to bypass)...
[INFO] Validating config string... done
[INFO] Creating task...
[INFO] Creating task... done
[INFO] Executing task...
[INFO] Executing task... done
[INFO] Creating output sync map file...
[INFO] Creating output sync map file... done
[INFO] Created file 'map.srt'
and I'm done!