Skip to content

Instantly share code, notes, and snippets.

@elrayle
Last active August 2, 2016 17:20
Show Gist options
  • Save elrayle/11898117572445a15c4a to your computer and use it in GitHub Desktop.
Save elrayle/11898117572445a15c4a to your computer and use it in GitHub Desktop.
Understanding Persistence Strategies for ActiveTriples

Understanding Persistence Strategies for ActiveTriples


Last Update: 2016-07-29


Table of Contents


Persistence Strategies

PersistenceStrategy defines an abstract persistence interface. It is a module that is included in the concrete implementation strategy classes. It defines methods that the concrete strategy must implement.

RepositoryStrategy operates on each object as a single unit. Persisting and destroying an object with RepositoryStrategy directly effects the triplestore repository adding or removing the triple statements for that object. No other triples are updated that might be part of another object to which this object is linked via a triple statement in this object.

ParentStrategy operates on each object in the context of the graph created by the parent chain with the final_parent holding all triples for that graph. Persisting and destroying an object with ParentStrategy adds or removes the triple statements from the final_parent. Each object in the graph has one-and-only-one parent creating a graph that has a tree structure. Objects in the graph can refer to each other in triples creating a more complex non-directional graph, but the chain of ancestors through @parent is a uni-directional tree with the terminal node (final_parent) being the object where @parent.nil? is true.

The remainder of this document provides more detailed descriptions of expected behaviors and code that demonstrates the persistence behaviors for each strategy by example.

Comparing Parent Strategy and Repository Strategy

The following table shows methods that override methods in PersistenceStrategy which is an abstract definition of the persistence interface. Some methods in this module are implemented and usable by the concrete implementation strategy classes. Others raise an error stating the methods are abstract and must be implemented by the concrete strategy class.

Repository Strategy Parent Strategy
Knows about itself (obj) itself (obj) and parent object (parent)
erase_old_resource delete statements from obj delete statements from final parent (highest parent in ancestor chain)
persist! save obj to repository save obj to final_parent
persisted? no change true only if obj.persisted? AND parent.persisted?
destroy no change delete obj.statements from final_parent AND delete all statements in parent that have obj.rdf_subject as subject or object
reload query repository query final_parent

General definitions:

persist

  • save object's statements in the triplestore
  • can persist with persist! method
  • can check persistence with persisted? method which returns the value of @persisted
question_16 QUESTION: Does @persisted reflect whether the object has changed since the last persist? Should it return to false if any changes are made to the object?

destroy

  • remove object's statements from the triplestore
  • clear variable holding object's statements such that is responds to empty? with true
  • can destroy with destroy method (destroy! is an alias of destroy)
  • can check destruction with destroyed? method which returns the value of @destroyed
question_16 QUESTION: Does @destroyed reflect whether the object has been removed from the triplestore? If the user sets values in the remaining object, should the @destroyed flag be set to false and the @persisted flag be set to false.

Expected Behaviors for Repository Strategy

See general definition of Repository Strategy above in the Persistence Strategies section.

obj.persist!

  • Put all obj's statements into the repository.
  • Set @persisted for this obj to true.

obj.persisted?

  • Returns true if @persisted for this obj is set to true.

obj.destroy (alias destroy!)

  • Destroy obj (via PersistenceStrategy #destroy)
    • Empty object of all statements.
    • All statements are removed from triplestore.
    • Object continues to exist as a variable with rdf_subject unchanged and all other properties set to [].

Expected Behaviors for Parent Strategy

See general definition of Parent Strategy above in the Persistence Strategies section.

info_16 NOTE: Destroying objects is not as tightly coupled to final_parent. Destroying final_parent does not destroy its descendants.

obj.persist!

  • Put all obj's statements into final_parent.
  • Set @persisted for this obj to true.
info_16 NOTE: final_parent has RepositoryStrategy, so when it is persisted, all statements (including those for all descendants of final_parent) are persisted to the repository. The descendents' statements are persisted because they are copied into the final_parent by ParentStrategy.
question_16 QUESTION: It appears that even without persisting each descendant object, the parent has all statements from all descendants. All changes to properties are set in the final_parent as well. Where is the code that makes that happen?

obj.persisted?

  • Returns false for all objects with ParentStrategy unless obj and each parent from obj to final_parent all have persisted? return true.
question_16 QUESTION: I don't see where descendant objects get @persisted set to true. But it must happen because obj.persisted? returns true for all descendants after final_parent.persist! is called.

obj.destroy (alias destroy!)

  • Remove all statements from final_parent that come from obj
  • Remove all statements from obj's parent that have obj as the subject or object of the triple
  • Destroy obj (via PersistenceStrategy #destroy)
    • Empty object of all statements.
    • All statements are removed from triplestore.
    • Object continues to exist as a variable with rdf_subject unchanged and all other properties set to [].
info_16 NOTE: Destroying the final_parent does NOT destroy the children.

Examples

Classes for use by all strategies

require 'active_triples'
require 'linkeddata'

ActiveTriples::Repositories.add_repository :default, RDF::Repository.new
r = ActiveTriples::Repositories.repositories[:default]

class DummyGrandchildResource
  include ActiveTriples::RDFSource
  configure repository: :default, type: RDF::URI('http://www.example.com/type/Grandchild')  
  property :title, predicate: 'http://www.example.com/title'
end

class DummyChildResource
  include ActiveTriples::RDFSource
  configure repository: :default, type: RDF::URI('http://www.example.com/type/Child')  
  property :title, predicate: 'http://www.example.com/title'
  property :child, predicate: 'http://www.example.com/child', class_name: DummyGrandchildResource
end

class DummyResource
  include ActiveTriples::RDFSource
  configure repository: :default, type: RDF::URI('http://www.example.com/type/Parent')  
  property :title, predicate: 'http://www.example.com/title'
  property :child, predicate: 'http://www.example.com/child', class_name: DummyChildResource
end

Repository Strategy in Action

Setting up objects to use Repository Strategy

# ------------------------------------------------------
#  Testing effects of RepositoryStrategy
# ------------------------------------------------------
# Do not pass parent to children when you create them
pr = DummyResource.new('http://www.example.com/pr')
cr = DummyChildResource.new('http://www.example.com/cr')
gr = DummyGrandchildResource.new('http://www.example.com/gr')

# Without parent specified, all will use RepositoryStrategy for persistence
pr.persistence_strategy  # => #<ActiveTriples::RepositoryStrategy>
cr.persistence_strategy  # => #<ActiveTriples::RepositoryStrategy>
gr.persistence_strategy  # => #<ActiveTriples::RepositoryStrategy>

# Cannot show ancestors of children.
# NOTE: Repository Strategy does not define methods final_parent and ancestors.


# Add statements to resources
pr.title = 'Parent with children using RepositoryStrategy'
pr.child = cr
cr.title = 'Child using RepositoryStrategy'
cr.child = gr
gr.title = 'Grandchild using RepositoryStrategy'

Persisting using Repository Strategy

# Where do statements live?
# Statements live only on the object that is the subject of the triple.
puts pr.dump :ntriples   # 4 statements: pr.type, pr.title, pr.child, cr.type
puts cr.dump :ntriples   # 4 statements: cr.type, cr.title, cr.child, gr.type
puts gr.dump :ntriples   # 2 statements: gr.type, gr.title

puts r.dump :ntriples    # will see 0 statements because no objects have been persisted

# What happens when child is persisted?  
# Only child object (gr) is persisted because all are separate.
gr.persist!
pr.persisted?  # false
cr.persisted?  # false
gr.persisted?  # true

puts r.dump :ntriples    # 2 statement: gr.type, gr.title

# What happens parent is persisted?  
# Only parent object (pr) is persisted because all are separate.
pr.persist!
pr.persisted?  # true
cr.persisted?  # false
gr.persisted?  # true

puts r.dump :ntriples    # 6 statements: gr.type, gr.title, pr.type, pr.title, pr.child, AND cr.type (see question below)
question_16 QUESTION: What should be persisted for properties that define class_name?
  1. none of child's triples
  2. all of child's triples
  3. only type triple for child
NOTE: Use of the term child here does not imply a parent-child relationship. 'child' is the name of the property in this specific example. Relationship's defined using class_name are part of a non-directional graph.

Resuming using Repository Strategy

pr_ = DummyResource.new('http://www.example.com/pr')  # read pr from the repository into pr_ variable
puts pr_.dump :ntriples               # 3 statements: pr.type, pr.title, pr.child -- DOES NOT have cr.type even though pr does - (see question below)
puts pr_.child.first.dump :ntriples   # 1 statement: cr.type -- See question above about what is saved for properties with class_name defined.
puts pr_.dump :ntriples               # 4 statements: pr.type, pr.title, pr.child, AND cr.type -- After dumping child, cr.type is now one of the triples for pr_ (see question below)

cr.persist!
puts r.dump :ntriples    # 8 statements: gr.type, gr.title, pr.type, pr.title, pr.child, cr.type, cr.title, cr.child

pr_.reload                            # re-read from repository (see question below)
puts pr_.dump :ntriples               # 4 statements: pr.type, pr.title, pr.child, AND cr.type (see question below))
puts pr_.child.first.dump :ntriples   # 1 statement: cr.type -- Still 1 statement even though cr has 3 triples in the repository 
question_16 QUESTION: Why doesn't pr_ have cr.type statement before dumping pr_.child, but does afterward?
Observation: Calling pr_.reload prior to dumping child triples has no impact. pr_ still does not have cr.type. Should resuming pr bring along cr triples, at minimum type (depends on the answer to the previous question about what should be persisted when using class_name)?

Destroying using Repository Strategy

# What happens when child is deleted?  
# No change except to deleted object (gr) because all are separate.
gr.destroy
puts pr.dump :ntriples  # unaffected -- 4 statements
puts cr.dump :ntriples  # unaffected -- 4 statements -- NOTE: cr still refers to the deleted gr
puts gr.dump :ntriples  # destroyed -- 0 statements

pr.destroyed?   # false 
cr.destroyed?   # false
gr.destroyed?   # true

pr.empty?       # false
cr.empty?       # false
gr.empty?       # true

puts r.dump :ntriples    # 6 statements: pr,type, pr.title, pr.child, cr.type, cr.title AND cr.child even though gr was destroyed

# What happens parent is deleted?  
# No change except to deleted object (pr) because all are separate.
pr.destroy
puts pr.dump :ntriples  # destroyed -- 0 statements
puts cr.dump :ntriples  # unaffected -- 4 statements
puts gr.dump :ntriples  # destroyed previously - 0 statements

pr.destroyed?   # true 
cr.destroyed?   # false
gr.destroyed?   # true - destroyed previously

pr.empty?       # true
cr.empty?       # false
gr.empty?       # true - destroyed previously

puts r.dump :ntriples    # 3 statements: cr.type, cr.title, cr.child

# Finish cleaning up the repository
cr.destroy
puts r.dump :ntriples    # 0 statements

Parent Strategy in Action

Setting up objects to use Parent Strategy

# ------------------------------------------------------
#  Testing effects of ParentStrategy
# ------------------------------------------------------
# Pass parent to children when you create them
pp = DummyResource.new('http://www.example.com/pp')
cp = DummyChildResource.new('http://www.example.com/cp',pp)
gp1 = DummyGrandchildResource.new('http://www.example.com/gp1',cp)
gp2 = DummyGrandchildResource.new('http://www.example.com/gp2',cp)
gp3 = DummyGrandchildResource.new('http://www.example.com/gp3',cp)

# With parent specified, the children will use ParentStrategy for persistence
pp.persistence_strategy   # => #<ActiveTriples::RepositoryStrategy>
cp.persistence_strategy   # => #<ActiveTriples::ParentStrategy>
gp1.persistence_strategy  # => #<ActiveTriples::ParentStrategy>
gp2.persistence_strategy  # => #<ActiveTriples::ParentStrategy>
gp3.persistence_strategy  # => #<ActiveTriples::ParentStrategy>

# Show ancestors of children
cp.persistence_strategy.final_parent      # => #<DummyResource...>
cp.persistence_strategy.ancestors.to_a    # => [#<DummyResource:...>]
gp1.persistence_strategy.final_parent     # => #<DummyResource...>
gp1.persistence_strategy.ancestors.to_a   # => [#<DummyChildResource...>,#<DummyResource:...>]
gp2.persistence_strategy.final_parent     # => #<DummyResource...>
gp2.persistence_strategy.ancestors.to_a   # => [#<DummyChildResource...>,#<DummyResource:...>]
gp3.persistence_strategy.final_parent     # => #<DummyResource...>
gp3.persistence_strategy.ancestors.to_a   # => [#<DummyChildResource...>,#<DummyResource:...>]

# What are the attributes of the resources?
pp.attributes    # => {"id"=>"http://www.example.com/pp", "title"=>[], "child"=>[]}
cp.attributes    # => {"id"=>"http://www.example.com/cp", "title"=>[], "child"=>[]}
gp1.attributes   # => {"id"=>"http://www.example.com/gp1", "title"=>[]}
gp2.attributes   # => {"id"=>"http://www.example.com/gp2", "title"=>[]}
gp3.attributes   # => {"id"=>"http://www.example.com/gp3", "title"=>[]}

# Add statements to resources
pp.title = 'Parent with children using ParentStrategy'
pp.child = cp
cp.title = 'Child using ParentStrategy'
cp.child = [gp1,gp2,gp3]
gp1.title = 'Grandchild #1 using ParentStrategy'
gp2.title = 'Grandchild #2 using ParentStrategy'
gp3.title = 'Grandchild #3 using ParentStrategy'

Persisting using Parent Strategy

# Where do statements live?
# Statements live on object that is subject AND final_parent (pp).
puts pp.dump :ntriples    # 14 statements: pp.type, pp.title, pp.child, cp,type, cp.title, (3) cp.child, gp1.type, gp1.title, gp2.type, gp2.title, gp3.type, gp3.title
puts cp.dump :ntriples    # 3 statements: cp.type, cp.title, cp.child
puts gp1.dump :ntriples   # 2 statement: gp1.type, gp1.title
puts gp2.dump :ntriples   # 2 statement: gp2.type, gp2.title
puts gp3.dump :ntriples   # 2 statement: gp3.type, gp3.title

# What happens when child is persisted?  
# NONE - A child cannot be persisted outside of its parent.
gp1.persist!
pp.persisted?   # false
cp.persisted?   # false
gp1.persisted?  # false - A child cannot be persisted outside of its parent.
gp2.persisted?  # false
gp3.persisted?  # false

puts r.dump :ntriples    # 0 statements as the repository is only updated via final_parent (pp)

# What happens parent is persisted?  
# Parent and all children are persisted
pp.persist!
pp.persisted?   # true
cp.persisted?   # true
gp1.persisted?  # true
gp2.persisted?  # true
gp3.persisted?  # true

puts r.dump :ntriples    # 14 statements: pp.type, pp.title, pp.child, cp,type, cp.title, (3) cp.child, gp1.type, gp1.title, gp2.type, gp2.title, gp3.type, gp3.title

Resuming using Parent Strategy

pp_ = DummyResource.new('http://www.example.com/pp')  # read pp from the repository into pp_ variable
puts pp_.dump :ntriples               # 3 statements: pp.type, pp.title, pp.child -- DOES NOT resume any triples from other objects since it is a different instance from pp and is not the final_parent for any other objects (See also question below)
puts pp_.child.first.dump :ntriples   # 1 statement: cp.type -- See question above about what is saved for properties with class_name defined.
puts pp_.dump :ntriples               # 4 statements: pp.type, pp.title, pp.child, AND cp.type -- After dumping child, cp.type is now one of the triples for pp_ (see question below)

pp_.persistence_strategy  # => #<ActiveTriples::RepositoryStrategy>

cp.persist!
puts r.dump :ntriples    # no change because persistence is handled by final_parent and cp was persisted previously via pp.persist!

pp_.reload                            # re-read from repository (see question below)
puts pp_.dump :ntriples               # 4 statements: pp.type, pp.title, pp.child, AND cp.type (see question below))
puts pp_.child.first.dump :ntriples   # 1 statement: cp.type -- Still 1 statement even though cp has 3 triples in the repository 

# Resume child
cp_ = DummyResource.new('http://www.example.com/cp')  # read cp from the repository into cp_ variable
puts cp_.dump :ntriples               # 6 statements: cp.type, cp.title, (3) cp.child -- ??? Why two types?  Child type expected.  Parent type not expected.
puts cp_.child.first.dump :ntriples   # 1 statement: gpX.type -- See question above about what is saved for properties with class_name defined.
puts cp_.dump :ntriples               # 9 statements: cp.type, cp.title, (3) cp.child, AND gp1.type, gp2.type, and gp3.type -- After dumping one of the children, gpX.type is added for all the children to the triples of cp_ (see question below)

cp_.persistence_strategy  # => #<ActiveTriples::RepositoryStrategy>
question_16 QUESTION: See question at end of Resuming using Repository Strategy. The same question applies here.

Destroying using Parent Strategy

# What happens to statements when child is deleted?  
# Statements are removed from object (gp1) and final_parent (pp).
gp1.destroy
puts pp.dump :ntriples   # 12 statements: pp.type, pp.title, pp.child, cp.type, cp.title, (2) cp.child, gp2.type, gp2.title, gp3.type, gp3.title -- removed statements with gp1 as subject or object -- almost, see note below
warning_16 FAILED TO REMOVE from final_parent (pp)...
<http://www.example.com/cp> <http://www.example.com/child> <http://www.example.com/gp1>
puts cp.dump :ntriples   # 4 statements - cp.type, cp.title, (2) cp.child -- removed statements with gp1 as subject or object
puts gp1.dump :ntriples  # destroyed -- 0 statements
puts gp2.dump :ntriples  # unaffected -- 2 statements 
puts gp3.dump :ntriples  # unaffected -- 2 statements

pp.destroyed?    # false
cp.destroyed?    # false
gp1.destroyed?   # true
gp2.destroyed?   # false
gp3.destroyed?   # false

pp.empty?        # false
cp.empty?        # false
gp1.empty?       # true
gp2.empty?       # false
gp3.empty?       # false

puts r.dump :ntriples    # 14 statements: pp.type, pp.title, pp.child, cp,type, cp.title, (3) cp.child, gp1.type, gp1.title, gp2.type, gp2.title, gp3.type, gp3.title -- gp1 still exists as the repository is only updated via final_parent (pp)

pp.persist!
puts r.dump :ntriples    # 14 statements (unchanged): pp.type, pp.title, pp.child, cp,type, cp.title, (3) cp.child, gp1.type, gp1.title, gp2.type, gp2.title, gp3.type, gp3.title -- gp1 still exists as the repository is only updated via final_parent (pp)

# What happens if parent is deleted?  
# All children also destroyed
pp.destroy
puts pp.dump :ntriples   # destroyed -- 0 statements
puts cp.dump :ntriples   # unaffected -- 4 statements -- shouldn't it be destroyed -- cp.parent still points to pp
puts gp1.dump :ntriples  # destroyed previously
puts gp2.dump :ntriples  # unaffected -- 2 statements -- shouldn't it be destroyed
puts gp3.dump :ntriples  # unaffected -- 2 statements -- shouldn't it be destroyed

pp.destroyed?    # true
cp.destroyed?    # false
gp1.destroyed?   # true - destroyed previously
gp2.destroyed?   # false
gp3.destroyed?   # false

pp.empty?        # true
cp.empty?        # false
gp1.empty?       # true - destroyed previously
gp2.empty?       # false
gp3.empty?       # false

cp.persistence_strategy.final_parent      # => #<DummyResource...>

puts r.dump :ntriples    # 11 statements: cp.type, cp.title, (3) cp.child, gp1.type, gp1.title, gp2.type, gp2.title, gp3.type, gp3.title
question_16 QUESTION: I observed the following behaviors after destroying pp. Are these the expected behaviors?
pp              # => #<DummyResource:0x3fd3f21c9480(#<DummyResource:0x007fa7e4392900>)> -- as expected -- Similar to hash.clear!, empties object, but leaves class of object the same and empty? returns true 
pp.persisted?   # => true -- Isn't destroying a change?  Should persisted? return false after destroying? Or at least returned to false since it is no longer persisted in the triplestore?
pp.destroyed?   # => true -- as expected
pp.empty?       # => true -- as expected
pp.title        # => []   -- as expected since object remains, but is empty
pp.rdf_subject  # => #<RDF::URI:0x3fd3f21c9098 URI:http://www.example.com/pp> -- should this be cleared as part of the delete?
cp.parent       # => #<DummyResource:0x3fd3f21c9480(#<DummyResource:0x007fa7e4392900>)> -- EXPECTED it to be deleted too; actually expected the whole graph under pp to be deleted
puts r.dump :ntriples  # => 11 statements when I would expect there to be none.  I expect deleting final_parent would delete all objects in the graph under final_parent
@elrayle
Copy link
Author

elrayle commented Feb 16, 2016

I expected the delete of pp to delete all descendants, but it doesn't. Can you comment on the philosophy behind this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment