ikwattro/youtube-neo4j-blog-post-nlp.md

Created July 27, 2018 11:58

Star (0) You must be signed in to star a gist
Fork (0) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/ikwattro/c60dc59fff05c9529a0251b8b5588020.js"></script>
Save ikwattro/c60dc59fff05c9529a0251b8b5588020 to your computer and use it in GitHub Desktop.

Download ZIP

Raw

youtube-neo4j-blog-post-nlp.md

Create an NLP pipeline

CALL ga.nlp.processor.addPipeline({
name:"transcript", 
textProcessor: 'com.graphaware.nlp.processor.stanford.StanfordTextProcessor',
processingSteps: {tokenize:true, ner:true, dependencies:true}
})

Run the caption texts analysis

CALL apoc.periodic.iterate(
'MATCH (n:Caption) RETURN n', 
'CALL ga.nlp.annotate({
            text: n.text, 
            id: id(n), 
            pipeline: "transcript", 
            checkLanguage:false
}) 
YIELD result MERGE (n)-[:HAS_ANNOTATED_TEXT]->(result)', 
{batchSize:1, iterateList:false})

Extract best keywords from captions

CALL apoc.periodic.iterate(
'MATCH (n:Caption)-[:HAS_ANNOTATED_TEXT]->(at) RETURN at', 
'CALL ga.nlp.ml.textRank({
            annotatedText: at, 
            useDependencies: true
}) 
YIELD result RETURN count(*)', 
{batchSize:1, iterateList:false})

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment