Skip to content

Instantly share code, notes, and snippets.

@kasei
Last active August 29, 2015 14:04
Show Gist options
  • Save kasei/5d34575b0c057f17d900 to your computer and use it in GitHub Desktop.
Save kasei/5d34575b0c057f17d900 to your computer and use it in GitHub Desktop.
Notes on PerlRDF Traits

Notes on PerlRDF Traits

Methods marked with a * are ones for which the role can provide a default implementation.

Classes

IRI class

Variable class

PatternTermType class (either an RDF Term or a variable)

  • subtype TermType
  • subtype Variable

TermType classes (an RDF Term)

  • IRI()
  • Blank()
  • LanguageLiteral(lang : LanguageCode) @@ related: MooseX::Types::Locale::Language
  • TypedLiteral(dt : IRI)

Roles

Term Role

  • type : TermType
  • value : Str

Binding Role

  • value($key) : Term

Triple Role : Binding

  • subject*
  • predicate*
  • object*

Quad Role : Binding

  • subject*
  • predicate*
  • object*
  • graph*

Result Role : Binding

  • join($result : Result)* : Result

Query trees and graphs

DirectedAcyclicGraph Role

This will be used to represent query algebra trees, query plans, and expressions.

  • children : ArrayRef[DirectedAcyclicGraph]
  • copy_replacing_children(@children)
  • walk* (walk all paths, possibly redundant)
  • cover* (like walk, but visits each node only once)
  • rewrite @@ not sure what the API will be here
  • match($pattern : Match) @@ need to define the Match class/API
  • extract_nodes(does => $class_or_role, matches => \&filter)* @@ uses cover() to find all nodes matching some criteria

QueryTree Role : DirectedAcyclicGraph

  • in_scope_variables : ArrayRef[Variable]
  • necessarily_bound_variables : ArrayRef[Variable]
  • required_variables : ArrayRef[Variable]

Algebra Role : QueryTree

Plan Role : QueryTree, Auditable

Auditable Role

  • cost

Models and Stores

TripleStore Role

  • get_triples(s, p, o)
  • count_triples(s, p, o)*

MutableTripleStore Role

  • add_triple(s, p, o)
  • remove_triple(s, p, o)

CacheableTripleStore Role

  • last_modified_date_for_triples(s, p, o)

QuadStore Role

  • get_quads(s, p, o, g)
  • count_quads(s, p, o, g)*
  • get_graphs()*

MutableQuadStore Role

  • add_quad(s, p, o, g)
  • remove_quad(s, p, o, g)

CacheableQuadStore Role

  • last_modified_date_for_quads(s, p, o, g)

Model Role

  • get_quads(s, p, o, g)
  • get_bindings(s, p, o, g)* @@ this is like get_quads; both return things that will act like variable binding sets (with one having variable names as keys, the other the position names like 'subject'). maybe triples/quads/results all need to conform to a shared role?
  • get_graphs()*
  • count_quads(s, p, o, g)*

MutableModel Role

  • add_quad(q)
  • remove_quad(q)
  • create_graph(g)
  • drop_graph(g)
  • clear_graph(g)

BulkUpdatableModel Role : MutableModel

  • begin_bulk_updates()
  • end_bulk_updates()
  • @@ TODO: or maybe a perform_bulk_updates(&) method?

CacheableModel Role

  • last_modified_date_for_quads(s, p, o, g)

QueryPlanner Role

  • plans_for_algebra($algebra) : Maybe[List[Plan]]

Parsers

A concrete Parser must conform to one of PushParser, PullParser, or AtOnceParser. It must also conform to one of the roles indicating the type of data it will generate: TripleParser, QuadParser, MixedStatementParser (both triples and quads), or ResultParser.

Parser Role

  • has 'canonical_media_type' => (isa => 'Str')
  • has 'media_types' => (isa => 'ArrayRef[Str]')

PushParser Role : Parser

  • parse_cb_from_io($io, $base, \&handler)
  • parse_cb_from_bytes($data, $base, \&handler)

PullParser Role : Parser

  • parse_iter_from_io($io, $base)
  • parse_iter_from_bytes($data, $base)

AtOnceParser Role : Parser

  • parse_list_from_io($io, $base)
  • parse_list_from_bytes($data, $base)

TripleParser Role : Parser

QuadParser Role : Parser

MixedStatementParser Role : Parser

ResultParser Role : Parser

Serializers

A concrete Serializer must conform to one of the roles indicating the type of data it will consume: TripleSerializer, QuadSerializer, MixedStatementSerializer (both triples and quads), or ResultSerializer.

Serializer Role

  • canonical_media_type
  • media_types
  • serialize_iter_to_io($io, $iter)
  • serialize_list_to_io($io, @list)
  • serialize_iter_to_bytes($iter)
  • serialize_list_to_bytes(@list)

AbbreviatingSerializer Role : Serializer

  • has base => (isa => 'IRI')
  • has prefixes => (isa => 'HashRef[IRI]')

TripleSerializer Role : Serializer

QuadSerializer Role : Serializer

MixedStatementSerializer Role : Serializer

ResultSerializer Role : Serializer

Errors

Error types:

  • MethodInvocationError
  • DatabaseError
  • SerializationError
  • ParserError
  • ComparisonError
  • FilterEvaluationError
  • TypeError
  • ExecutionError
  • PermissionError

ParserErrors relating to data (e.g. parsing RDF or SPARQL content) may conform to a role that specifies where in the data the error was encountered. Depending on the type of parsing error and input method, such an error may provide location data with differing specificity. Examples include:

  • Line number (LineError)
  • Specific byte offset (LocationError where the location element is a DataLocation)
  • Starting and ending line/column pairs (RangeError where the from and to elements are TextLocations)

subtype 'Line', as 'Int', where { $_ > 0 };

subtype 'Column', as 'Int', where { $_ > 0 };

subtype 'Offset', as 'Int', where { $_ >= 0 };

DataLocation Role : Location

  • offset : Offset

TextLocation class : Location

  • line : Line
  • column : Column

Range class

  • from : Location
  • to : Location

LineError Role # Error on some line of text @@ maybe this should be a RangeError with the range of the entire line?

  • line : Line

LocationError Role # Error at a specific location (line/column or byte offset) of data

  • location : Location

RangeError Role # Error at a specific range (from/to location) of data

  • range : Range

Fundamental Types

  • Term
  • Variable
  • PatternTerm (Term+Variable)
  • Triple
  • Quad
  • Triple Pattern @@ better way to synthesize this out of Triple+PatternTerm?
  • Quad Pattern @@ better way to synthesize this out of Quad+PatternTerm?
  • List[Triple]
  • List[Quad]
  • List[Triple Pattern]
  • List[Quad Pattern]
  • Variable Bindings
  • Iterator[Triple]
  • Iterator[Quad]
  • Iterator[Variable Bindings]
@kjetilk
Copy link

kjetilk commented Jul 31, 2014

This looks good to me, there are few things I don't quite understand, one is probably just due to my relative ignorance of the type system: I don't understand the connection of the types at the end and the classes at the start, e.g. the Term type and the TermType class.

I was also wondering where RDF:Trine::Pattern belongs in this, it is a useful module now, is that just a List[Triple Pattern] now? Can we extend it with e.g. selectivity estimates?

@kasei
Copy link
Author

kasei commented Aug 5, 2014

The "Fundamental Types" section is somewhat separate from the other stuff. I wanted to try to list all the sorts of core data we might use/expect in programming with the system. I'd like to use that at some point to try to think more about possible APIs that are simpler that what we've got now -- something like a matrix of these types with interesting cells being the place where there's potential API calls. For example, the pair (triple pattern × triple) probably indicates a method to produce a variable binding. Some other ideas:

  • (Variable Binding × Variable)Term (accessing a specific variable value from a result)
  • (List[Triple Pattern] × Variable Bindings)List[Triple] (producing a graph by instantiating a CONSTRUCT pattern)
  • (List[Quad] × Term)List[Triple] (extracting a single graph from a dataset)

I found that trying to think about it like this helped me in seeing where we're currently missing potentially useful APIs.

@kasei
Copy link
Author

kasei commented Aug 5, 2014

TermType is my attempt at simplifying how Terms are represented. Using TermType, a Term is just a pair of ($value, $type) where $value is just a string and $type is one of:

  • IRI
  • Blank
  • LanguageLiteral($lang)
  • TypedLiteral($dt)

Note that LanguageLiteral and TypedLiteral are parameterized with either a language tag or a datatype IRI. This simplifies the API in dealing with terms (especially for comparing term objects which would just be ($a->type == $b->type and $a->value eq $b->value) instead of the weird ternary comparison we do now on literals by checking for language tag or datatype or simple literals.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment