Using the graph to control unique id generation

Introduction

This gist was prompted by Nigel Small’s tweet of a query to generate a unique id for a node (and is posted here with his agreement). It inspired me to think about how it could be used in a full example, unrestricted by Twitter’s 140 characters. I have also looked at how we could generate different sets of unique ids for different labels.

Auto-incrementing #Neo4j counter MERGE (x:Counter {name:'foo'}) ON CREATE SET x.count = 0 ON MATCH SET x.count = x.count + 1 RETURN x.count
— Nigel Small (@technige) December 16, 2013

Background

Unique identifiers may be required in many different use cases, and are currently heavily used in SQL, NoSQL databases and also spreadsheets and flat file databases. Many sql databases, for example Microsoft SQL Server, have a feature to auto generate a unique id for each row in a table. This auto-generate feature is very useful in that a developer does not need to worry about uniqueness.

Neo4j does provide a unique id for each node and relationship, but they are not persistent. The id can be accessed by returning id(node) or id(relationship). This id is unique, but it can change if the database store is compacted. This compaction currently only occurs when the database is restarted, but it does mean the ids are volatile. Neo4j’s unique ids are also reused, given an id you may expect a Person and get a Fruit.

The solution below can be implemented by developers using Neo4j, so that for each type of node (or even relationship) a reliably unique id can be generated, even when multiple threads may be accessing the database at one time. The code/query can be executed whenever a new node or relationship is being created.

Implementation Note

If this were a real project I would execute the unique id generating query as a separate transaction, in order to make it thread-safe, but without blocking the server for onerous amounts of time. Given that all numbers are not guaranteed to have been used as an id, it is not a problem to generate an id and have a delay before using it.

If the id generation were not in a transaction then duplicates could be created, if more than one thread creates the same type of node at the same time.

Create a Person with a unique id

This query will generate a unique id and then create a Person node using that id. In this instance the ON CREATE line is executed (as the id node does not already exist), and the ON MATCH line will be ignored.

// get unique id
MERGE (id:UniqueId{name:'Person'})
ON CREATE SET id.count = 1
ON MATCH SET id.count = id.count + 1
WITH id.count AS uid
// create Person node
CREATE (p:Person{id:uid,firstName:'Bob',lastName:'Jones'})
RETURN p AS person

Create another Person with a unique id

This query will create another unique id, using identical code to the previous query, and then create another Person node using that id. This time the ON CREATE line is not executed (as the id node already exists), and the ON MATCH line will be executed (once again, because the id node exists).

// get unique id
MERGE (id:UniqueId{name:'Person'})
ON CREATE SET id.count = 1
ON MATCH SET id.count = id.count + 1
WITH id.count AS uid
// create Person node
CREATE (p:Person{id:uid,firstName:'Gabriel',lastName:'Smith'})
RETURN p AS person

Show that all Person nodes have a unique id

This query simply finds all Person nodes so that we can see that they have unique ids.

MATCH (p:Person)
RETURN p as persons
ORDER BY p.id

Create a Book with a unique id

This query will generate a unique id for a Book node, showing that we can have different sets of unique ids for different types of node if we wish.

// get unique id
MERGE (id:UniqueId{name:'Book'})
ON CREATE SET id.count = 1
ON MATCH SET id.count = id.count + 1
WITH id.count AS uid
// create Book node
CREATE (b:Book{id:uid,title:'1984',author:'George Orwell'})
RETURN b AS book

Create another Book with a unique id

This query shows that the unique id generator will again work like the one for generating Person unique ids.

// get unique id
MERGE (id:UniqueId{name:'Book'})
ON CREATE SET id.count = 1
ON MATCH SET id.count = id.count + 1
WITH id.count AS uid
// create Book node
CREATE (b:Book{id:uid,title:'The Lion, The Witch & The Wardrobe',author:'C S Lewis'})
RETURN b AS book

Show that all Person and Book nodes have a unique id, which is unique to their label

The Person nodes and Book nodes have ids that are unique to the node label, but not to all nodes in the database

// return all nodes that are not UniqueId nodes
MATCH (n)
WHERE NOT (n:UniqueId)
RETURN str(labels(n)) AS type, n AS node
ORDER BY type, node.id

Make sure that a duplicate Book is not created

This query goes further and uses MERGE to make sure that the Book node being created is unique (the title and author do not already exist in the same type of node). In this case the Book node is matched, not created, and therefore the ON MATCH clause is executed and the count decremented again.

This decrementing would only work if the id were generated in the same query or transaction (as it is here). If the id were generated in another transaction then we could not decrement the counter as another thread may have incremented it again already. We would just have to accept that some ids might not be used.

// get unique id
MERGE (id:UniqueId{name:'Book'})
ON CREATE SET id.count = 1
ON MATCH SET id.count = id.count + 1
WITH id.count AS uid, id
// create or match Book node
MERGE (b:Book{title:"The Lion, The Witch & The Wardrobe",author:"C S Lewis"})
ON CREATE SET b.id = uid
ON MATCH SET id.count = id.count -1
RETURN b AS book, id AS id_generator

Show how we could create non-numeric unique ids

This query will generate a unique id for a Place node, showing that we can generate unique ids as strings, rather than numeric values.

// get unique id
MERGE (id:UniqueId{name:'Place',str:'pl_'})
ON CREATE SET id.count = 1
ON MATCH SET id.count = id.count + 1
WITH id.str + id.count AS uid
// create Place node
CREATE (p:Place{id:uid,name:'London'})
RETURN p AS place

Show how we could create unique ids for relationships

This query will generate a unique id for a relationship.

// get unique id
MERGE (id:UniqueId{name:'LIVES_IN'})
ON CREATE SET id.count = 1
ON MATCH SET id.count = id.count + 1
WITH id.count AS uid
// create LIVES_IN relationship
MATCH (pe:Person{firstName:'Bob'}), (pl:Place{name:'London'})
MERGE (pe)-[r:LIVES_IN{id:uid}]->(pl)
RETURN pe AS person, r AS relationship, pl AS place

Id for all nodes

If you simply wanted a unique id for every node in the database then the query is even simpler; just use only one id generating node. You would then go on to create whatever node or relationship you wished.

You could have a global id for each node, and also a label id for each node label (or relationship type) if you wanted, e.g. properties called gid and lid/tid (or whatever you like). Each of these ids would need to be generated before the node or relationship was created.

// get unique id
MERGE (id:GlobalUniqueId)
ON CREATE SET id.count = 1
ON MATCH SET id.count = id.count + 1
RETURN id.count AS generated_id

Future Development

For the moment Neo4j does not have a feature to automatically generate a persistent unique id, but I foresee it being a feature at some point. There are other ways of generating unique ids, including SnowMaker (Windows Azure) and writing a Java extension for Neo4j.

If you have any good resources, or comments for making this example better, please post below.

About Me

I am a Web Application Developer, based in Hampshire, UK. I started looking at Neo4j around 2 months ago, and have become obsessed. I wrote most of this gist on the evening of 16th December, after Nigel tweeted, which shows how obsessed I can become. Graphs really are everywhere!

tekiegirl/uniqueId.adoc