Skip to content

Instantly share code, notes, and snippets.

@afs
Last active July 7, 2025 13:57
Show Gist options
  • Save afs/1b5d7f24daf013894d76a062ee78bb9f to your computer and use it in GitHub Desktop.
Save afs/1b5d7f24daf013894d76a062ee78bb9f to your computer and use it in GitHub Desktop.
Notes on SHACL Rules

SHACL Rules

SHACL rules infer new triples. The input is a data graph and a shapes graph which has rules; the output is a graph of inferred triples.

Whether, and how, the output graph is added back into the data graph is not part of the SHACL rules process. If being run to enrich the data graph, the process can be done inside a transaction updating the data graph.

The rule system is also capable of triple pattern matching - what datalog calls a "query" - where only necessary rules are run. (This is "backward chaining")

Once repeated application of rules is considered, there does not seem to be useful subsets of features. Fortunately, simple evaluation of (positive i.e. without negation) datalog is not a major piece of implementation. Run rules until no change; the implementation does not need to inspect the rule set structure. The literature has several better algorithms where implementation isn't particularly complicated. Datalog evaluation is amenable to parallel processing.

Rule

A SHACL rule is:

  • a condition for the rule, called the "body" in datalog literature.
    • For SHACL, BGP + filter (still datalog? ...) + assignment.
  • a conclusion, called the "head" in datalog literature.
    • a triples template

Notes:

  • Filters must not use NOT EXISTS or EXISTS.
  • Node expressions used as filters must be single valued, must not read the data diretcly and must not be aggregations
  • No property paths allowed. (Adding seq paths to the BGP matching is natural, and maybe alt paths.)
  • Assignment, BIND, needs great care - it can be used to create non-datalog infinite rules. For example, incrementing a value (example below). Without a guard, that will lead to infinite execution as it will trigger itself.

The primary syntax is RDF.

sh:rule 
  sh:head ( [ sh:subject [ sh:var "x" ] ] ;
              sh:predicate rdf:type ;
              sh:object ex:Square ;
            ] );
  sh:body ( 
    [ sh:subject [ sh:var "x" ] ;
      sh:predicate ex:width ;
      sh:object [ sh:var "w" ]
    ]
    [ sh:subject [ sh:var "x" ] ;
      sh:predicate ex:height ;
      sh:object [ sh:var "h" ]
    ]
    [ sh:filter [ sparql:equals ( [ sh:var "w" ] [ sh:var "h" ] ) ] ]
  ) ;
  .

A compact syntax will be needed:

RULE  { ?x rdf:type ex:Square }
WHERE { 
  ?x ex:width ?w ;
     ex:height ?h .
  FILTER( ?w = ?h )
}

Examples

Ex1. A simple renaming:

    RULE { ?x foaf:name ?name } WHERE { ?x vcard:fn ?name }

Ex2. Infer:

    RULE { ?x :grandChildOf ?y } WHERE { ?x :childOf ?Z . ?Z :childOf ?y }

Ex3. Disjunction:

     RULE { ?x :childOf ?y } WHERE { ?x :motherOf ?y }
     RULE { ?x :childOf ?y } WHERE { ?x :fatherOf ?y }

Combining ex2 and ex3, calcaulates ":grandChildOf" using ":motherOf/:fatherOf" data.

Ex4. Recursion (written naively):

    RULE { ?x rdfs:subClassOf ?y } WHERE { ?x rdfs:subClassOf ?Z . ?Z rdfs:subClassOf ?y }

Ex5. Mutual recursion:

    # Given data 
    :value1 :value 1 .
    :value2 :value 2 .
    :value3 :value 3 .
    :value4 :value 4 .
    ## A new fact - a rule with empty condition.
    RULE { :x1 rdf:type ex:Odd } WHERE {}

    # Successor of an odd integer is even.
    RULE { ?x rdf:type ex:Even } WHERE {
           ?y rdf:type ex:Odd ; ?y :value ?yVal .
           ?x :value ?xVal
           FILTER (yVal+1 = ?xVal )
        }
    # Successor of an even integer is odd
    RULE { ?x rdf:type ex:Odd } WHERE {
           ?y rdf:type ex:Even ; ?y :value ?yVal .
           ?x :value ?xVal
           FILTER (yVal+1 = ?xVal )
        }

Ex 6: this rule is non-terminating, and not datalog because it creates a new RDF term:

    RULE { :s :p ?incV } WHERE { :s :p ?v BIND(?v + 1 AS ?incV ) }

Queries

Rules can be used to generate new data but also to answer a goal (datalog - "query" or "goal") by solving a pattern using rules. This is backward chaining.

Terminology

Rule, Rule Set (Datalog - "program")

Base Graph - Data Graph - (Datalog - EDB - Extensional Database)

Inference Graph - computed triples, exising during rule evaluation - (Datalog - IDB - Intensional Database)

Fact : either a rule with an empty condition or a triple from the starting data graph.

Dependency relationship

Rule R1 depends on rule R2 is R1 has a triple pattern that appears in the head of R2.

Rule R1 depends on rule R2 if there is some rule R such that R1 dependsOn R and R depends on R2

(i.e. "depends on" is transitive)

Stratification

Stratification (of positive datalog)

Stratum zero is the initial data graph.

Write strat(X) for the stratification number of rule X

If R1 depends on R2 then the strat(R1) must be greater than or equal to strat(R2)

When choosing to stratify where possible, if strat(R1) equals strat(R2), then there is a mutual recusion.

Evalaution can be done strata-by-strata in stratification number order.

If strat(R1) is strictly greater than strat(R2), R2 can be completely calculated before R1 is run.

Execution

Bottom-up evaluation ("forward chaining")

Naïve evaluation is "run until no change" bottom-up, (forward-chaining) i.e. total materialization.

Semi-naïve, "Jacobi", "Gause-Seidel" - more efficent materialization

Top-down evaluation ("backward chaining")

"Top-down evaluation strategies begin with a query or goal." https://en.wikipedia.org/wiki/Datalog#Semi-na%C3%AFve_evaluation

Later

Generalizations to consider later:

  • named relationship (workspace, not retained, useful in practice)
  • quads - where triples are arity 3, quads are arity 4.

Negation

  • Negation is non-existence of a triple
    Stratification is required and a condition that any rule R with a negated conclusion atom must only depend on strata 0..strat(R)-1, and not strat(R).

Production Rules

Being based on datalog gives a solid foundation but some tasks can be hard, for example, increment a property value by one.

For this, SHACL could have "run once" rules. These can even remove triples. They are simialr to SPARQL Updates.

RULE
DEL { :s :p ?v }
ADD { :s :p ?incV } 
WHERE { :s :p ?v BIND (?v + 1 AS ?incV ) }

c.f. SPARQL update DELETE-INSERT-WHERE

SHACL Advanced Feature Note

SHACL-AF Triple rules

SHACL-AF Property Value Rules (2018)

SHACL-AF Construct rules ("SPARQL rules")

Datalog abstract syntax

https://en.wikipedia.org/wiki/Datalog#Syntax

<program> ::= <rule> <program> | ""
<rule> ::= <atom> ":-" <atom-list> "."
<atom> ::= <relation> "(" <term-list> ")"
<atom-list> ::= <atom> | <atom> "," <atom-list> | ""
<term> ::= <constant> | <variable>
<term-list> ::= <term> | <term> "," <term-list> | ""
@simonstey
Copy link

How would you write a Datalog rule to remove triples? i.e. inferring the non-existence of a fact?

https://en.wikipedia.org/wiki/Negation_as_failure ?

@simonstey
Copy link

The primary syntax is RDF.

sh:rule 
  sh:head ( [ sh:subject [ sh:var "x" ] ] ;
              sh:predicate rdf:type ;
              sh:object ex:Square ;
            ] );
  sh:body ( 
    [ sh:subject [ sh:var "x" ] ;
      sh:predicate ex:width ;
      sh:object [ sh:var "w" ]
    ]
    [ sh:subject [ sh:var "x" ] ;
      sh:predicate ex:height ;
      sh:object [ sh:var "h" ]
    ]
    [ sh:filter [ sparql:equals ( [ sh:var "w" ] [ sh:var "h" ] ) ] ]
  ) ;
  .

A compact syntax will be needed:

RULE  { ?x rdf:type ex:Square }
WHERE { 
  ?x ex:width ?w ;
     ex:height ?h .
  FILTER( ?w = ?h )
}

so would we then scrap the current rule syntax from shacl-af https://www.w3.org/TR/shacl-af/#rules ?

@afs
Copy link
Author

afs commented May 12, 2025

Instead of 'a condition for the rule, called the "head" in datalog literature.' shouldn't this be 'a condition for the rule, called the "body" in datalog literature.'?

Thanks - fixed.

@afs
Copy link
Author

afs commented May 12, 2025

Why no EXISTS in Filters? Wouldn't it be just a triple pattern or binding value to match? It seems unproblematic compared to NOT EXISTS.

It can be written in the graph pattern part.

As a (node) expression, it can be written inside other expressions like ! EXISTS - which is NOT EXISTS.

Analysing the rules to find rule dependencies would need to analyse expressions as well a triple patterns and some - like "or" - make that complicated.

@afs
Copy link
Author

afs commented May 12, 2025

How would you write a Datalog rule to remove triples? i.e. inferring the non-existence of a fact?

Inference - generally - is deducing new information and making it available. That might be by putting it back in the data graph, or it might be generating a separate graph with inferred triples, or working backward to see if a fact can be proved or used for query rewriting.

Negation in datalog is the absence of a fact, not removal. Transformation of the data graph is more in the domain of SPARQL Update, which may be based on the conclusions of an inference. Removing during inference isn't datalog.

We could add some "Production Rules" features around a datalog core. But these mutating operations are more like "programming" than "inference".

I listed negation under "later" so do if time allows because I think the WG can declare success with "positive datalog" if it chooses to / all that time allows. "Semi-positive negation" (only looking in the original data graph) is a reasonably simple step. The more complete "straified negation" puts more work on an implementation.

@afs
Copy link
Author

afs commented May 12, 2025

so would we then scrap the current rule syntax from shacl-af https://www.w3.org/TR/shacl-af/#rules ?

No need to scrap the syntax - the note above uses the syntax style ([ sh:var "x" ] is a node expression).

There are three different "rules" in SHACL-AF.

  • "Triple Rules" should be just a "short form" of the general case.
  • SHACL-AF Property Value Rules (added in the 2018 edition) look to be a "short form" as well.
  • "SPARQL Rules" ("CONSTRUCT") can be retained but the rule engine can't analyse them without a full SPARQL subsystem, they have subqueries and aggregation, and they can cause unbound variables in the CONSTRUCT template ("head"). They would have to be user-controlled. See the "Production Rules" -- something like "run the well-behaved rules, then run the ad hoc features".

@HolgerKnublauch
Copy link

If would use INFER { ... } WHERE { ... } instead of RULE { ... } WHERE { ... }, aligns better with CONSTRUCT.

@afs
Copy link
Author

afs commented May 13, 2025

Prototype on RDF syntax. Not much consideration on detailed naming.

Compact syntax to RDF, then rebuild the compact syntax from the RDF.

SHACL Inference Rules Syntax

Input:

PREFIX : <http://example/>

RULE { ?x :bothPositive true . } WHERE { ?x :p ?v1  FILTER ( ?v1 > 0 )  ?x :q ?v2  FILTER ( ?v2 > 0 )  }
RULE { ?x :oneIsZero true . } WHERE { ?x :p ?v1 ;  :q ?v2  FILTER ( ( ?v1 = 0 ) || ( ?v2 = 0 ) )  }

RDF:

PREFIX :    <http://example/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sh:  <http://www.w3.org/ns/shacl#>

:ruleSet-1
  sh:ruleSet (
    [
      rdf:type sh:Rule;
      sh:body (
        [
          sh:object [ sh:var "v1" ];
          sh:predicate :p;
          sh:subject [ sh:var "x" ]
        ]
        [ sh:sparqlExpr "( ?v1 > 0 )" ]
        [
          sh:object [ sh:var "v2" ];
          sh:predicate :q;
          sh:subject [ sh:var "x" ]
        ]
        [ sh:sparqlExpr "( ?v2 > 0 )" ]
      );
      sh:head (
        [
          sh:object true;
          sh:predicate :bothPositive;
          sh:subject [ sh:var "x" ]
        ]
      )
    ]
    [
      rdf:type sh:Rule;
      sh:body (
        [
          sh:object [ sh:var "v1" ];
          sh:predicate :p;
          sh:subject [ sh:var "x" ]
        ]
        [
          sh:object [ sh:var "v2" ];
          sh:predicate :q;
          sh:subject [ sh:var "x" ]
        ]
        [ sh:sparqlExpr "( ( ?v1 = 0 ) || ( ?v2 = 0 ) )" ]
      );
      sh:head (
        [
          sh:object true;
          sh:predicate :oneIsZero;
          sh:subject [ sh:var "x" ]
        ]
      )
    ]
  ) .

Rebuild:

PREFIX :    <http://example/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sh:  <http://www.w3.org/ns/shacl#>

RULE { ?x :bothPositive true . } WHERE { ?x :p ?v1  FILTER ( ?v1 > 0 )  ?x :q ?v2  FILTER ( ?v2 > 0 )  }
RULE { ?x :oneIsZero true . } WHERE { ?x :p ?v1 ;  :q ?v2  FILTER ( ( ?v1 = 0 ) || ( ?v2 = 0 ) )  }

@afs
Copy link
Author

afs commented May 25, 2025

Grammar: (functional; a bit messy; includes a few unused elements)

HTML
BNF

Changes:

  • node expressions written in RDF form.
  • DATA block - these are datalog "facts" - a rule with a head and no body.
Illustration

Input:

PREFIX : <http://example/>

DATA { :x :p 1 ; :q 2 . }

RULE { ?x :bothPositive true . } WHERE { ?x :p ?v1  FILTER ( ?v1 > 0 )  ?x :q ?v2  FILTER ( ?v2 > 0 )  }
RULE { ?x :oneIsZero true . } WHERE { ?x :p ?v1 ;  :q ?v2  FILTER ( ( ?v1 = 0 ) || ( ?v2 = 0 ) )  }

RDF:

PREFIX :       <http://example/>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sh:     <http://www.w3.org/ns/shacl#>
PREFIX sparql: <http://www.w3.org/ns/sparql#>

:ruleSet-1
  rdf:type sh:RuleSet;
  sh:data (
    <<( :x :p 1 )>>
    <<( :x :q 2 )>>
  );
  sh:ruleSet (
    [
      rdf:type sh:Rule;
      sh:body (
        [
          sh:object [ sh:var "v1" ];
          sh:predicate :p;
          sh:subject [ sh:var "x" ]
        ]
        [
          sh:expr [
            sparql:greaterThan (
              [ sh:var "v1" ]
              0
            )
          ]
        ]
        [
          sh:object [ sh:var "v2" ];
          sh:predicate :q;
          sh:subject [ sh:var "x" ]
        ]
        [
          sh:expr [
            sparql:greaterThan (
              [ sh:var "v2" ]
              0
            )
          ]
        ]
      );
      sh:head (
        [
          sh:object true;
          sh:predicate :bothPositive;
          sh:subject [ sh:var "x" ]
        ]
      )
    ]
    [
      rdf:type sh:Rule;
      sh:body (
        [
          sh:object [ sh:var "v1" ];
          sh:predicate :p;
          sh:subject [ sh:var "x" ]
        ]
        [
          sh:object [ sh:var "v2" ];
          sh:predicate :q;
          sh:subject [ sh:var "x" ]
        ]
        [
          sh:expr [
            sparql:function-or (
              [
                sparql:equals (
                  [ sh:var "v1" ]
                  0
                )
              ]
              [
                sparql:equals (
                  [ sh:var "v2" ]
                  0
                )
              ]
            )
          ]
        ]
      );
      sh:head (
        [
          sh:object true;
          sh:predicate :oneIsZero;
          sh:subject [ sh:var "x" ]
        ]
      )
    ]
  ) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment