SHACL rules infer new triples. The input is a data graph and a shapes graph which has rules; the output is a graph of inferred triples.
Whether, and how, the output graph is added back into the data graph is not part of the SHACL rules process. If being run to enrich the data graph, the process can be done inside a transaction updating the data graph.
The rule system is also capable of triple pattern matching - what datalog calls a "query" - where only necessary rules are run. (This is "backward chaining")
Once repeated application of rules is considered, there does not seem to be useful subsets of features. Fortunately, simple evaluation of (positive i.e. without negation) datalog is not a major piece of implementation. Run rules until no change; the implementation does not need to inspect the rule set structure. The literature has several better algorithms where implementation isn't particularly complicated. Datalog evaluation is amenable to parallel processing.
A SHACL rule is:
- a condition for the rule, called the "body" in datalog literature.
- For SHACL, BGP + filter (still datalog? ...) + assignment.
- a conclusion, called the "head" in datalog literature.
- a triples template
Notes:
- Filters must not use
NOT EXISTS
orEXISTS
. - Node expressions used as filters must be single valued, must not read the data diretcly and must not be aggregations
- No property paths allowed. (Adding seq paths to the BGP matching is natural, and maybe alt paths.)
- Assignment,
BIND
, needs great care - it can be used to create non-datalog infinite rules. For example, incrementing a value (example below). Without a guard, that will lead to infinite execution as it will trigger itself.
The primary syntax is RDF.
sh:rule
sh:head ( [ sh:subject [ sh:var "x" ] ] ;
sh:predicate rdf:type ;
sh:object ex:Square ;
] );
sh:body (
[ sh:subject [ sh:var "x" ] ;
sh:predicate ex:width ;
sh:object [ sh:var "w" ]
]
[ sh:subject [ sh:var "x" ] ;
sh:predicate ex:height ;
sh:object [ sh:var "h" ]
]
[ sh:filter [ sparql:equals ( [ sh:var "w" ] [ sh:var "h" ] ) ] ]
) ;
.
A compact syntax will be needed:
RULE { ?x rdf:type ex:Square }
WHERE {
?x ex:width ?w ;
ex:height ?h .
FILTER( ?w = ?h )
}
Ex1. A simple renaming:
RULE { ?x foaf:name ?name } WHERE { ?x vcard:fn ?name }
Ex2. Infer:
RULE { ?x :grandChildOf ?y } WHERE { ?x :childOf ?Z . ?Z :childOf ?y }
Ex3. Disjunction:
RULE { ?x :childOf ?y } WHERE { ?x :motherOf ?y }
RULE { ?x :childOf ?y } WHERE { ?x :fatherOf ?y }
Combining ex2 and ex3, calcaulates ":grandChildOf" using ":motherOf/:fatherOf" data.
Ex4. Recursion (written naively):
RULE { ?x rdfs:subClassOf ?y } WHERE { ?x rdfs:subClassOf ?Z . ?Z rdfs:subClassOf ?y }
Ex5. Mutual recursion:
# Given data
:value1 :value 1 .
:value2 :value 2 .
:value3 :value 3 .
:value4 :value 4 .
## A new fact - a rule with empty condition.
RULE { :x1 rdf:type ex:Odd } WHERE {}
# Successor of an odd integer is even.
RULE { ?x rdf:type ex:Even } WHERE {
?y rdf:type ex:Odd ; ?y :value ?yVal .
?x :value ?xVal
FILTER (yVal+1 = ?xVal )
}
# Successor of an even integer is odd
RULE { ?x rdf:type ex:Odd } WHERE {
?y rdf:type ex:Even ; ?y :value ?yVal .
?x :value ?xVal
FILTER (yVal+1 = ?xVal )
}
Ex 6: this rule is non-terminating, and not datalog because it creates a new RDF term:
RULE { :s :p ?incV } WHERE { :s :p ?v BIND(?v + 1 AS ?incV ) }
Rules can be used to generate new data but also to answer a goal (datalog - "query" or "goal") by solving a pattern using rules. This is backward chaining.
Rule, Rule Set (Datalog - "program")
Base Graph - Data Graph - (Datalog - EDB - Extensional Database)
Inference Graph - computed triples, exising during rule evaluation - (Datalog - IDB - Intensional Database)
Fact : either a rule with an empty condition or a triple from the starting data graph.
Rule R1 depends on rule R2 is R1 has a triple pattern that appears in the head of R2.
Rule R1 depends on rule R2 if there is some rule R such that R1 dependsOn R and R depends on R2
(i.e. "depends on" is transitive)
Stratification (of positive datalog)
Stratum zero is the initial data graph.
Write strat(X) for the stratification number of rule X
If R1 depends on R2 then the strat(R1) must be greater than or equal to strat(R2)
When choosing to stratify where possible, if strat(R1) equals strat(R2), then there is a mutual recusion.
Evalaution can be done strata-by-strata in stratification number order.
If strat(R1) is strictly greater than strat(R2), R2 can be completely calculated before R1 is run.
Naïve evaluation is "run until no change" bottom-up, (forward-chaining) i.e. total materialization.
Semi-naïve, "Jacobi", "Gause-Seidel" - more efficent materialization
"Top-down evaluation strategies begin with a query or goal." https://en.wikipedia.org/wiki/Datalog#Semi-na%C3%AFve_evaluation
- Magic sets
- SLD resolution(SLD = Selective Linear Definite)
Generalizations to consider later:
- named relationship (workspace, not retained, useful in practice)
- quads - where triples are arity 3, quads are arity 4.
- Negation is non-existence of a triple
Stratification is required and a condition that any rule R with a negated conclusion atom must only depend on strata 0..strat(R)-1, and not strat(R).
Being based on datalog gives a solid foundation but some tasks can be hard, for example, increment a property value by one.
For this, SHACL could have "run once" rules. These can even remove triples. They are simialr to SPARQL Updates.
RULE
DEL { :s :p ?v }
ADD { :s :p ?incV }
WHERE { :s :p ?v BIND (?v + 1 AS ?incV ) }
c.f. SPARQL update DELETE-INSERT-WHERE
https://en.wikipedia.org/wiki/Datalog#Syntax
<program> ::= <rule> <program> | ""
<rule> ::= <atom> ":-" <atom-list> "."
<atom> ::= <relation> "(" <term-list> ")"
<atom-list> ::= <atom> | <atom> "," <atom-list> | ""
<term> ::= <constant> | <variable>
<term-list> ::= <term> | <term> "," <term-list> | ""
It can be written in the graph pattern part.
As a (node) expression, it can be written inside other expressions like
! EXISTS
- which isNOT EXISTS
.Analysing the rules to find rule dependencies would need to analyse expressions as well a triple patterns and some - like "or" - make that complicated.