Skip to content

Instantly share code, notes, and snippets.

@kbastani
Created August 14, 2013 19:06
Show Gist options
  • Save kbastani/6234424 to your computer and use it in GitHub Desktop.
Save kbastani/6234424 to your computer and use it in GitHub Desktop.
Follow the steps in this gist to find and delete duplicate nodes on property and index in Neo4j graph database web admin console.
// Delete duplicate nodes as a list collected from the output of neo4j-cypher-duplicate-get-node.txt
START n=node(1120038,1120039,1120040,1120042,1120044,1120048,1120049,1120050,1120053,1120067,1120068)
// Replace IDs above with the IDs from CommaSeparatedListOfIds in neo4j-duplicate-get-node.txt
MATCH n-[r]-()
DELETE r, n
// Collect IDs of indexed nodes with duplicated unique properties
START n=node:invoices("PO_NUMBER:(\"112233\")")
WITH n
ORDER BY id(n) DESC // Order by descending to delete the most recent duplicated record
WITH n.Key? as DuplicateKey, COUNT(n) as ColCount, COLLECT(id(n)) as ColNode
WITH DuplicateKey, ColCount, ColNode, HEAD(ColNode) as DuplicateId
WHERE ColCount > 1 AND (DuplicateKey is not null) AND (DuplicateId is not null)
WITH DuplicateKey, ColCount, ColNode, DuplicateId
ORDER BY DuplicateId
RETURN DuplicateKey, ColCount, DuplicateId
//RETURN COLLECT(DuplicateId) as CommaSeparatedListOfIds
// ** Toggle comments for the return statements above to validate duplicate records
// ** Do not proceed to delete without validating
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment