-
-
Save jexp/7b7e123061d5933cc9b1 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| = Using hierarchical facets | |
| We have a usecase with documents that are tagged with keywords in a theasaurus. This gists explains the model and is at the same time an invitation to suggest improvements. Because it would be nice to have something that performs better. | |
| == The model | |
| //setup | |
| //hide | |
| [source,cypher] | |
| ---- | |
| CREATE (doc1:`doc` {`name`:"doc 1"}) | |
| CREATE (doc2:`doc` {`name`:"doc 2"}) | |
| CREATE (doc3:`doc` {`name`:"doc 3"}) | |
| CREATE (doc4:`doc` {`name`:"doc 4"}) | |
| CREATE (doc5:`doc` {`name`:"doc 5"}) | |
| CREATE (root:`term` {`name`:"root"}) | |
| CREATE (term2:`term` {`name`:"term 2"}) | |
| CREATE (term3:`term` {`name`:"term 3"}) | |
| CREATE (term4:`term` {`name`:"term 4"}) | |
| CREATE (term5:`term` {`name`:"term 5"}) | |
| CREATE (term6:`term` {`name`:"term 6"}) | |
| CREATE (term7:`term` {`name`:"term 7"}) | |
| CREATE (term2)-[:BT]->(root) | |
| CREATE (term3)-[:BT]->(root) | |
| CREATE (term4)-[:BT]->(term2) | |
| CREATE (term5)-[:BT]->(term2) | |
| CREATE (term6)-[:BT]->(term3) | |
| CREATE (term7)-[:BT]->(term3) | |
| CREATE (doc1)-[:HAS_TERM]->(term2) | |
| CREATE (doc1)-[:HAS_TERM]->(term4) | |
| CREATE (doc1)-[:HAS_TERM]->(term5) | |
| CREATE (doc2)-[:HAS_TERM]->(root) | |
| CREATE (doc2)-[:HAS_TERM]->(term3) | |
| CREATE (doc3)-[:HAS_TERM]->(term6) | |
| CREATE (doc3)-[:HAS_TERM]->(term4) | |
| CREATE (doc3)-[:HAS_TERM]->(term2) | |
| CREATE (doc4)-[:HAS_TERM]->(term7) | |
| CREATE (doc5)-[:HAS_TERM]->(term2) | |
| CREATE (doc5)-[:HAS_TERM]->(term3) | |
| ---- | |
| //graph | |
| == Get docs for each term | |
| === DIRECT | |
| This query returns the docs to which the specified term is directly (the term has a link to the doc) | |
| [source,cypher] | |
| ---- | |
| MATCH (d:doc)-[:HAS_TERM]->(t:term) | |
| RETURN t.name AS term,collect(d.name) AS docs | |
| ORDER BY term.name | |
| ---- | |
| //table | |
| === DIRECT and INDIRECT | |
| This query returns the docs to which the specified term is linked, both directly (the term has a link to the doc) and indirectly (one of the more detailed terms is linked to the doc) | |
| [source,cypher] | |
| ---- | |
| MATCH (d:doc)-[:HAS_TERM]->()-[:BT*0..]->(t) | |
| RETURN t.name AS term,collect(d.name) AS docs | |
| ORDER BY term.name | |
| ---- | |
| //table | |
| == Get count for root and its children | |
| [source,cypher] | |
| ---- | |
| match (d:doc)-[:HAS_TERM]->()-[:BT*0..]->(t2:term {name:"root"}) | |
| return t.name AS term, count(d) as docs | |
| order by docs desc | |
| UNION | |
| match (d:doc)-[:HAS_TERM]->()-[:BT*0..]->(t:term)-[:BT]->(t2:term {name:"root"}) | |
| return t.name AS term, count(d) as docs | |
| order by docs desc | |
| ---- | |
| //table | |
| == Get count for "term 3" and its children | |
| [source,cypher] | |
| ---- | |
| match (d:doc)-[:HAS_TERM]->()-[:BT*0..]->(t2:term {name:"term 3"}) | |
| return t.name AS term, count(d) as docs | |
| order by docs desc | |
| UNION | |
| match (d:doc)-[:HAS_TERM]->()-[:BT*0..]->(t:term)-[:BT]->(t2:term {name:"term 3"}) | |
| return t.name AS term, count(d) as docs | |
| order by docs desc | |
| ---- | |
| //table | |
| == Question | |
| The main question is, is there a faster way based on pure Neo4j? Trying this approach with 10k docs, a thesausrus with 12k terms and 1.2M [:HAS_TERM] relationships takes over 20 secs when getting counts for "root" and diirect children. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment