Last active
August 29, 2015 14:01
-
-
Save tomzeppenfeldt/05d92f567adbe971afc5 to your computer and use it in GitHub Desktop.
Hierarchical facets
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
= Using hierarchical facets | |
We have a usecase with documents that are tagged with keywords in a theasaurus. This gists explains the model and is at the same time an invitation to suggest improvements. Because it would be nice to have something that performs better. | |
NOTE : In this example we have a quite regular thesaurus. IRL the thesaurus with branches of varying depth and docs are tagged with both leaf and non-leaf nodes (i.e. nodes without and with children repectively) | |
== The model | |
//setup | |
[source,cypher] | |
---- | |
CREATE (doc1:`doc` {`name`:"doc 1"}) | |
CREATE (doc2:`doc` {`name`:"doc 2"}) | |
CREATE (doc3:`doc` {`name`:"doc 3"}) | |
CREATE (doc4:`doc` {`name`:"doc 4"}) | |
CREATE (doc5:`doc` {`name`:"doc 5"}) | |
CREATE (root:`term` {`name`:"root"}) | |
CREATE (term2:`term` {`name`:"term 2"}) | |
CREATE (term3:`term` {`name`:"term 3"}) | |
CREATE (term4:`term` {`name`:"term 4"}) | |
CREATE (term5:`term` {`name`:"term 5"}) | |
CREATE (term6:`term` {`name`:"term 6"}) | |
CREATE (term7:`term` {`name`:"term 7"}) | |
CREATE (term2)-[:BT]->(root) | |
CREATE (term3)-[:BT]->(root) | |
CREATE (term4)-[:BT]->(term2) | |
CREATE (term5)-[:BT]->(term2) | |
CREATE (term6)-[:BT]->(term3) | |
CREATE (term7)-[:BT]->(term3) | |
CREATE (doc1)-[:HAS_TERM]->(term2) | |
CREATE (doc1)-[:HAS_TERM]->(term4) | |
CREATE (doc1)-[:HAS_TERM]->(term5) | |
CREATE (doc2)-[:HAS_TERM]->(root) | |
CREATE (doc2)-[:HAS_TERM]->(term3) | |
CREATE (doc3)-[:HAS_TERM]->(term6) | |
CREATE (doc3)-[:HAS_TERM]->(term4) | |
CREATE (doc3)-[:HAS_TERM]->(term2) | |
CREATE (doc4)-[:HAS_TERM]->(term7) | |
CREATE (doc5)-[:HAS_TERM]->(term2) | |
CREATE (doc5)-[:HAS_TERM]->(term3) | |
---- | |
//graph | |
== Get docs for each term | |
=== DIRECT | |
This query returns the docs to which the specified term is directly (the term has a link to the doc) | |
[source,cypher] | |
---- | |
MATCH (d:doc)-[:HAS_TERM]->(t:term) | |
RETURN t.name AS term,collect(DISTINCT d.name) AS docs ORDER BY term | |
---- | |
//table | |
=== DIRECT and INDIRECT | |
This query returns the docs to which the specified term is linked, both directly (the term has a link to the doc) and indirectly (one of the more detailed terms is linked to the doc) | |
[source,cypher] | |
---- | |
MATCH (d:doc)-[:HAS_TERM|BT*0..]->(t:term) | |
RETURN t.name AS term,collect(DISTINCT d.name) AS docs ORDER BY term | |
---- | |
//table | |
== Get count for root and its children | |
[source,cypher] | |
---- | |
match (d:doc)-[:HAS_TERM|BT*0..]->(t:term)-[:BT*0..1]->(t2:term {name:"root"}) | |
return t.name AS term, count(DISTINCT d) as docs order by docs desc | |
---- | |
//table | |
== Get count for "term 3" and its children | |
[source,cypher] | |
---- | |
match (d:doc)-[:HAS_TERM|BT*0..]->(t:term)-[:BT*0..1]->(t2:term {name:"term 3"}) | |
return t.name AS term, count(DISTINCT d) as docs order by docs desc | |
---- | |
//table | |
== Question | |
The main question is, is there a faster way based on pure Neo4j? Trying this approach with 10k docs, a thesausrus with 12k terms and 1.2M [:HAS_TERM] relationships takes over 20 secs when getting counts for "root" and diirect children. | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment