Skip to content

Instantly share code, notes, and snippets.

@ikwattro
Last active July 27, 2018 11:47
Show Gist options
  • Select an option

  • Save ikwattro/5c91e84935ea9a533f4f109a23dcff72 to your computer and use it in GitHub Desktop.

Select an option

Save ikwattro/5c91e84935ea9a533f4f109a23dcff72 to your computer and use it in GitHub Desktop.

Import the files

CALL ga.nlp.utils.listFiles("/Users/ikwattro/dev/_transcript", ".vtt")
YIELD filePath
MERGE (v:VideoTranscript {path: filePath})
WITH v, filePath
CALL ga.nlp.parser.webvtt(filePath)
YIELD startTime, endTime, text
MERGE (c:Caption {id: filePath + startTime + endTime}) SET c.text = text, c.start = startTime, c.end = endTime
MERGE (v)-[:HAS_CAPTION]->(c)

Set a title being the last part of the filePath

MATCH (c:VideoTranscript) SET c.title = split(c.path, "/")[size(split(c.path, "/")) -1]

Convert start and end times of captions from hour:minute:second format to milliseconds

MATCH (n:Caption)
WITH n, split(n.start, ".") AS parts
WITH n, parts, toInteger(parts[1]) AS msa
WITH n, msa, split(parts[0], ":") AS his
WITH n, msa, his, (toInteger(his[0]) * 60 * 60 * 1000) AS hms, (toInteger(his[1]) * 60 * 1000) AS mms, (toInteger(his[2]) * 1000) AS sms
SET n.startTimeMS = (hms + mms + sms + msa)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment