Skip to content

Instantly share code, notes, and snippets.

@jexp
Forked from tomzeppenfeldt/hierarchicalfacets
Last active August 29, 2015 14:01
Show Gist options
  • Select an option

  • Save jexp/7b7e123061d5933cc9b1 to your computer and use it in GitHub Desktop.

Select an option

Save jexp/7b7e123061d5933cc9b1 to your computer and use it in GitHub Desktop.
= Using hierarchical facets
We have a usecase with documents that are tagged with keywords in a theasaurus. This gists explains the model and is at the same time an invitation to suggest improvements. Because it would be nice to have something that performs better.
== The model
//setup
//hide
[source,cypher]
----
CREATE (doc1:`doc` {`name`:"doc 1"})
CREATE (doc2:`doc` {`name`:"doc 2"})
CREATE (doc3:`doc` {`name`:"doc 3"})
CREATE (doc4:`doc` {`name`:"doc 4"})
CREATE (doc5:`doc` {`name`:"doc 5"})
CREATE (root:`term` {`name`:"root"})
CREATE (term2:`term` {`name`:"term 2"})
CREATE (term3:`term` {`name`:"term 3"})
CREATE (term4:`term` {`name`:"term 4"})
CREATE (term5:`term` {`name`:"term 5"})
CREATE (term6:`term` {`name`:"term 6"})
CREATE (term7:`term` {`name`:"term 7"})
CREATE (term2)-[:BT]->(root)
CREATE (term3)-[:BT]->(root)
CREATE (term4)-[:BT]->(term2)
CREATE (term5)-[:BT]->(term2)
CREATE (term6)-[:BT]->(term3)
CREATE (term7)-[:BT]->(term3)
CREATE (doc1)-[:HAS_TERM]->(term2)
CREATE (doc1)-[:HAS_TERM]->(term4)
CREATE (doc1)-[:HAS_TERM]->(term5)
CREATE (doc2)-[:HAS_TERM]->(root)
CREATE (doc2)-[:HAS_TERM]->(term3)
CREATE (doc3)-[:HAS_TERM]->(term6)
CREATE (doc3)-[:HAS_TERM]->(term4)
CREATE (doc3)-[:HAS_TERM]->(term2)
CREATE (doc4)-[:HAS_TERM]->(term7)
CREATE (doc5)-[:HAS_TERM]->(term2)
CREATE (doc5)-[:HAS_TERM]->(term3)
----
//graph
== Get docs for each term
=== DIRECT
This query returns the docs to which the specified term is directly (the term has a link to the doc)
[source,cypher]
----
MATCH (d:doc)-[:HAS_TERM]->(t:term)
RETURN t.name AS term,collect(d.name) AS docs
ORDER BY term.name
----
//table
=== DIRECT and INDIRECT
This query returns the docs to which the specified term is linked, both directly (the term has a link to the doc) and indirectly (one of the more detailed terms is linked to the doc)
[source,cypher]
----
MATCH (d:doc)-[:HAS_TERM]->()-[:BT*0..]->(t)
RETURN t.name AS term,collect(d.name) AS docs
ORDER BY term.name
----
//table
== Get count for root and its children
[source,cypher]
----
match (d:doc)-[:HAS_TERM]->()-[:BT*0..]->(t2:term {name:"root"})
return t.name AS term, count(d) as docs
order by docs desc
UNION
match (d:doc)-[:HAS_TERM]->()-[:BT*0..]->(t:term)-[:BT]->(t2:term {name:"root"})
return t.name AS term, count(d) as docs
order by docs desc
----
//table
== Get count for "term 3" and its children
[source,cypher]
----
match (d:doc)-[:HAS_TERM]->()-[:BT*0..]->(t2:term {name:"term 3"})
return t.name AS term, count(d) as docs
order by docs desc
UNION
match (d:doc)-[:HAS_TERM]->()-[:BT*0..]->(t:term)-[:BT]->(t2:term {name:"term 3"})
return t.name AS term, count(d) as docs
order by docs desc
----
//table
== Question
The main question is, is there a faster way based on pure Neo4j? Trying this approach with 10k docs, a thesausrus with 12k terms and 1.2M [:HAS_TERM] relationships takes over 20 secs when getting counts for "root" and diirect children.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment