Skip to content

Instantly share code, notes, and snippets.

@philastrophist
Forked from jexp/fetch_tree.adoc
Created September 21, 2020 14:52
Show Gist options
  • Save philastrophist/f24bb271561f659a7ddf5f55a419d643 to your computer and use it in GitHub Desktop.
Save philastrophist/f24bb271561f659a7ddf5f55a419d643 to your computer and use it in GitHub Desktop.
Fetch a Tree with Neo4j

Fetch a Tree with Neo4j

Today I came across a really interesting StackOverflow question:

Given a forest of trees in a Neo4j REST server, I`m trying to return a single tree given the root vertex. Being each tree quite large, I need a de-duplicated list of all vertices and edges in order to be able to reconstruct the full tree on the client side.

CREATE (r:root)
FOREACH (i IN range(1,5)|
         CREATE (r)-[:PARENT]->(c:child { id:i }));

MATCH (c:child)
         FOREACH (j IN range(1,5)|
            CREATE (c)-[:PARENT]->(:child { id:c.id*10+j }));

How many nodes and rels do we have in the graph ? One root node, 5 children and 25 grandchildren, makes 31 nodes, 30 relationships.

match (n)-[r]-()
return count(distinct n) as nodes, count(distinct r) as rels

Answer by Brian Underwood

Brian answered the question with this suggestion:

if you want to get all of the nodes and relationships in one go, you may need to execute two queries:

MATCH p = (r:root)-[*]->(x)
WHERE NOT(x-->())
UNWIND nodes(p) AS Vertex
RETURN DISTINCT Vertex;
MATCH p = (r:root)-[*]->(x)
WHERE NOT(x-->())
UNWIND rels(p) AS Edge
RETURN DISTINCT startNode(Edge), endNode(Edge);

Update 1: Combine queries

I thought about it that it should be easy to combine both queries into one:

MATCH p = (r:root)-[*]->(x)
WHERE NOT(x-->())
UNWIND nodes(p) AS Vertex
WITH collect(DISTINCT Vertex) as nodes, collect(rels(p)) as paths
UNWIND paths AS Edges
UNWIND Edges as Edge
WITH nodes, collect(distinct Edge) as rels
RETURN size(nodes),size(rels),nodes,rels

Update 2: Only return ids

Speed it up by just ignoring the additional (expensive) where condition and let distinct do its job.

MATCH p = (r:root)-[*]->(x)
UNWIND nodes(p) AS Vertex
WITH collect(DISTINCT id(Vertex)) as nodes, collect(rels(p)) as paths
UNWIND paths AS Edges
UNWIND Edges as Edge
WITH nodes, collect(distinct [id(startNode(Edge)),id(endNode(Edge))]) as rels
RETURN size(nodes),size(rels),nodes,rels

Update 3: The "Cypher" Way

Actually cypher returns one path per match-step, so even the sub-paths that build up the tree are returned, so each x is returned once from the match. (Those intermediate ones were filtered out before).

So we can just use them and collect them in a distinct list. That’s for the nodes.

For the rels we only need to look at the last relationship before x as the others were already returned by previous paths. We collect the distinct last rels and extract the id’s of the start- and end-nodes to be returned.

MATCH p = (:root)-[r*]->(x)
WITH collect(DISTINCT id(x)) as nodes, [r in collect(distinct last(r)) | [id(startNode(r)),id(endNode(r))]] as rels
RETURN size(nodes),size(rels), nodes, rels

Update 4: Including the root node

If you also want to include the root node, just change * to *0.. like here:

MATCH p = (:root)-[r*0..]->(x)
WITH collect(DISTINCT id(x)) as nodes, [r in collect(distinct last(r)) | [id(startNode(r)),id(endNode(r))]] as rels
RETURN size(nodes),size(rels), nodes, rels
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment