Skip to content

Instantly share code, notes, and snippets.

@jdittrich
Last active August 26, 2022 10:22
Show Gist options
  • Save jdittrich/480caeca4bc36db07a09b88cb7151f8e to your computer and use it in GitHub Desktop.
Save jdittrich/480caeca4bc36db07a09b88cb7151f8e to your computer and use it in GitHub Desktop.
hacky little script to convert plain text transcripts that have timestamps in the beginning of lines into documents that can be imported to otranscribe (https://otranscribe.com/)
"""
hacky little script to convert plain text transcripts
that have timestamps in the beginning of lines
into documents that can be imported to otranscribe
The file format it takes looks like:
1:30 I mean yeah uhm
1:32 I dunno
Or, more tech-y, the following regex needs to match each line: ^(\d+):(\d+)(.*)$
where (\d+) is minutes (\d+) is seconds and (.*)$ is the rest of the line.
"""
import re
import sys
import fileinput
lines = None
with open(sys.argv[1], 'r') as f:
lines = f.readlines()
def convertLineToOTR(line):
timeMatch = re.search("^(\d+):(\d+)(.*)$",line)
minutes = int(timeMatch.group(1))
seconds = int(timeMatch.group(2))
timestring = str(minutes)+":"+str(seconds)
text = timeMatch.group(3)
allSeconds = (minutes*60)+seconds
return f'<p><span class=\\"timestamp\\" data-timestamp=\\"{allSeconds}\\">{timestring}</span>{text}<br/></p>' # double escape cause " will mess with JSON, so we need "\" in the output
new_lines = list(map(convertLineToOTR,lines)) #type cast cause return type of map is map
# how to build a JSON the terrible way
header = ['{"text":"']
footer = ['","media":"please reload file","media-time":"000"}']
fullList = header+new_lines+footer
newFile = "".join(fullList)
sys.stdout.write(newFile)
@jdittrich
Copy link
Author

No, this is not a good example for how to create a JSON (use https://docs.python.org/3/library/json.html)

@jdittrich
Copy link
Author

If your files start somewhat different, e.g. with "(12:34) text text…" instead of "12:34 text text…", you need to change the regex (^(\d+):(\d+)(.*)$). Use https://www.regexpal.com/ or some other tool, so you do not need to run the script again and again.

@jdittrich
Copy link
Author

If there is interest in this, I could invest an hour to make it work in javascript so you could convert using a website rather than a python script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment