Cypher Constraint and Index Syntax Proposal

This proposoal follows three design principles:

Make it easy to understand the affected part of the graph in relationship to querying it (Rationale: Minimize mental effort for using constraints and indices)
Re-use existing concepts as much as possible (i.e. patterns and expressions) (Rationale: Minimize amount of new concepts that need to be learned)
Use the same imperative form as all existing update and schema commands (verb first, like in CREATE INDEX) and try to use similiar syntax structure for both constraint and index creation (Rationale: Ensure Cypher has a homogeneous syntax structure to give the whole language a consistent look and feel)

Constraint Syntax

In order to make it easy to understand the part of the graph affected by a constraint, the syntax ensures each constraint is easily translatable into a Cypher query such that the constraint holds if evaluating the query returns no results. To achieve the other two design goals, all constraints follow the same general form that is based on re-using patterns and expressions:

// Formerly: CREATE CONSTRAINT ...
ASSERT <Pattern> IMPLIES <Predicate>

// Fromerly: DROP CONSTRAINT ...
RETRACT <Pattern> IMPLIES <Predicate>

Using a pair of verbs has precedence in Cypher (SET/REMOVE, CREATE/DELETE) and treating ASSERT and RETRACT as a pair of dual verbs has at least precedence in logic programming. These general forms are easiliy translated into a Cypher query such that a constraint validates iff

MATCH <Pattern> WHERE NOT <Predicate> RETURN *

returns no results.

Let’s look at concrete examples for how this constraint syntax could be used.

Mandatory Property Constraint

ASSERT (u:User) IMPLIES has(u.email)

Alternatively:

ASSERT (u:User) IMPLIES u.email IS NOT NULL

Similarly, relationships could be constrained in the same way:

ASSERT ()-[r:ROAD]->() IMPLIES has(r.distance)

Uniqueness Constraint

ASSERT (some:User), (other:User)
IMPLIES some.email <> other.email

While this syntax is somewhat technical, it gives a very accurate description of what the constraint means in terms of a normal predicate over a cross product. It also extends directly to a multicomponent example:

ASSERT (some:User), (other:User)
IMPLIES [some.first, some.last] <> [other.first, other.last]

Propery Value Limitations

The proposed syntax naturally leads to more complex "Node Predicate Constraints":

ASSERT (u:User)
IMPLIES toLower(u.login) = u.login AND u.age > 18

it is easy to apply it to the the :ROAD example from the RFCP:

ASSERT ()-[r:ROAD]->()
IMPLIES
  has(r.distance)
  AND r.distace IS NUMBER
  AND NOT isInfinite(r.distance)
  AND r.distance > 0

The only new syntax introduced here is overloading the IS operator for type tests. This hinges on specifying the type system and type names (out of scope here).

The condition should only hold, if and only if r has a distance property. This could be expressed using NOT has(r.distance) OR <Predicate>. Alternatively, shortcut syntax might be helpful:

ASSERT ()-[r:ROAD]->()
WHERE has(r.distance)
IMPLIES
  r.distace IS NUMBER
  AND NOT isInfinite(r.distance)
  AND r.distance > 0

The remaining examples could be written as:

ASSERT (u:User) IMPLIES u.email =~ {regex}
ASSERT (u:User) IMPLIES u.active IS BOOLEAN
ASSERT (v:Vehicle) IMPLIES v.locations IS LIST OF GEOPOINTS

Required Relationships

ASSERT (p:Person) IMPLIES (p)-[:LIVES_AT->()
ASSERT (s:Sink) IMPLIES ()-[:FLOW]->(s)
ASSERT (l:Place) IMPLIES (l)-[:NEIGHBOUR]-()

This syntax hinges on the fact that pattern predicates evaluate to true if at least a single matching relationship is found.

More complex cardinality requirements may be written using size

ASSERT (swede:SwedishCitizen) IMPLIES size( (swede)-[:MARRIED_TO]-() ) <= 1
ASSERT (p:Person) IMPLIES size( (p)<-[:PARENT_OF]-() ) = 2
ASSERT (m:BuddhistMonk) IMPLIES NOT size( (m)-[:OWNS]->(:Thing) ) > 14

Endpoint requirements

Endpoint requirements are represented nicely using label predicates.

ASSERT (l)-[:OWNS]->() IMPLIES (l:Person) OR (l:Organization)
ASSERT ()-[:OWNS]->(r) IMPLIES (r:Vehicle) OR (r:Building) OR (r:Item) OR (r:Organisation)
ASSERT (l)-[:OWNS]->(:Organisation) IMPLIES (l:Person)
ASSERT (l)-[:WORKS_FOR]->(r) IMPLIES (l:Employee) AND ( (r:Employee) OR (r:Manager) )
ASSERT ()-[:OWNS]->(r) IMPLIES NOT (r:Person)

It is noteworthy how the pattern just selects the graph elements to be constrained while the predicate actually constrains them.

It also could be a good idea to investigate extending label tests to support alternatives (r:Employee|Manager)

Label Coexistence

The first example, "A node may not have both a :Person label and an :Organization label" could either be split up:

ASSERT (p:Person) IMPLIES NOT (p:Organization)
ASSERT (o:Organization) IMPLIES NOT (o:Person)

or alternatively it could be expressed in the same way as unique constraints:

ASSERT (p:Person), (o:Organization) IMPLIES p <> o

The second example translates directly:

ASSERT (u:User) IMPLIES (u:Person)

Beyond constraints: Derived Properties and Relationships

The suggested syntax pattern is very flexible and could also go beyond expressing plain constraints, like automatically computing properties from other properties or even inferring nodes and relationships:

// Infer a property
ASSERT (u:User)
INFERS u.last + ", " + u.first + " ( " + u.title + ")" AS u.name

// Infer property unless it exists
ASSERT (u:User)
INFERS MISSING u.birthyear - now().year AS u.age

// Infer relationship unless there is already one
ASSERT (a:Person)-[:FRIEND]-(b:Person)-[:FRIEND]-(c:Person)
INFERS MISSING (a:Person)-[:FRIEND]->(c:Person)

Syntax alternatives

If we find ASSERT and RETRACT to technical, we could instead adopt some of these word pairs:

ENSURE <Pattern> IMPLIES <Predicate>
IGNORE <Pattern> IMPLIES <Predicate>

CREATE CONSTRAINT CONFIRM <Pattern> IMPLIES <Predicate>
DROP CONSTRAINT RETRACT <Pattern> IMPLIES <Predicate>

CREATE CONSTRAINT ON <Pattern> ENSURE <Predicate>
DROP CONSTRAINT ON <Pattern> ENSURE <Predicate>

Indexing Syntax

Indexing should again re-use pattern syntax to describe matches to be indexed. Additionally it needs to list the index keys, and a placeholder for configuring the type of index and perhaps some implementation parameters.

The suggested basic form first specifies the index type and the indexed pattern followed by specifying the indexed columns. This form can be further differentiated by explicitly giving the index key, filtering the pattern using a predicate, and by describing ordering requirements (if sensible for the given index type).

CREATE [<Type>] INDEX ON <Pattern>
[WHERE <Predicate>]
[RETURN <Columns> (BY <Columns>|ORDER BY <Columns [ASC|DESC]>)
[USING <Parameters>]

Here are some examples:

// Default index
CREATE INDEX ON (n:Person) RETURN n BY n.name

// Hash index with extra parameter
CREATE HASH INDEX ON (n:Person) RETURN n BY n.name USING bucketSize=512

// Index on projection
CREATE INDEX ON (n:Person) RETURN n BY toLower(n.name)

// Only index nodes that fulfill a given predicate
CREATE INDEX ON (u:User) WHERE u.lastSeenAt = 'Dec-24' RETURN u BY u.login

// Sorted index keys (taken from ORDER BY)
CREATE BTREE INDEX
ON (a:Person)-[r:KNOWS]->(b:Person)
WHERE b.isRich
RETURN * ORDER BY r.since DESC

boggle/constraint-and-index-proposal.asciidoc