Skip to content

Instantly share code, notes, and snippets.

@kasei
Last active December 22, 2015 06:19
Show Gist options
  • Save kasei/6430369 to your computer and use it in GitHub Desktop.
Save kasei/6430369 to your computer and use it in GitHub Desktop.
Data Flows and Types in Managing RDF and SPARQL data

Data Flows and Types in Managing RDF and SPARQL data

Data types

The types of data we will concern ourselves with:

  • [T]riples
  • [Q]uads
  • [M]ixed triples and quads
  • [S]PARQL Results (mapping variable names to RDF Terms)
  • [O]perations (insert/delete operations for RDF Term tuples)
  • [B]ytes (flat, unstructured, or opaque data)

We will discuss data of type t as being either structured [t] or serialized [t'].

Data Sources and Sinks

The potential sources of data and their respective types:

  • variable [*, *']
  • IO handle [*']
  • iterator [T,Q,M,S,O]
  • URL [*'] # the actual type should be determined at dereference time by the content-type header
  • model [T,Q]
  • store [T,Q]

The potential destinations (sinks) for data and their respective types:

  • variable [*, *']
  • IO handle [B']
  • iterator [*]
  • model [M,O]
  • store [M,O; but impl. dep (some stores might just accept triples while others accept both triples and quads)]

An Input is a typed source (source[type]). An Output is a typed sink (sink[type]).

Parsers and Serializers

The parsers and their respective types:

The serializers and their respective types:

The parsing process is: Input x Parser -> Output

The serialization process is: Input x Serializer -> Output

The valid typings for these processes are:

source[T'] x parser[T',M'] -> sink[T,M]
source[Q'] x parser[Q',M'] -> sink[Q,M]
source[M'] x parser[M'] -> sink[M]
source[S'] x parser[S'] -> sink[S]
source[O'] x parser[O'] -> sink[O]

source[T] x serializer[T,M] -> sink[B']
source[Q] x serializer[Q,M] -> sink[B']
source[M] x serializer[M] -> sink[B']
source[S] x serializer[S,B] -> sink[B']
source[O] x serializer[O] -> sink[B']

Type checking on parsing and serializing

Parsing

Parse(source[t'], parser[u'], sink[v])
	check:
		# make sure that the types make sense
		t' == u' == v
		or
		(t',u',v) in:
			(T', [T', M'], [T, M])
			(Q', [Q', M'], [Q, M])

Serializing

Serialize(source[t], serializer[u], sink[B'])
	check:
		# make sure that the types make sense
		t == u
		or
		(t, u) in
			(T, M)
			(Q, M)
			(S, B)

Type conversion

Other types must make use of casting functions to participate in parsing and serializing. For example, to serialize quads into a triple format like N-Triples, a function to drop the graphs on each statement (and yielding triples) can be used to cast the store (typed as a [Q]uad source) to a [T]riple source:

# (Using a more functional syntax)
Serialize(drop-graph(store[Q])[T], N-Triples[T]) -> IOHandle[T']

Some useful casting functions are:

  • drop-graph(source[Q,M]) -> source[T]
  • add-graph(source[T,M], iri[R]) -> source[Q]
  • map-statement(source[T,Q,M], positionToNameMap) -> source[S]
  • construct(source[S], template[t: T,Q,M]) -> source[t]
  • insert(source[T,Q,M]) -> source[O]
  • delete(source[T,Q,M]) -> source[O]
  • filter(source[t], block) -> source[t]
  • map(source[t], block, u) -> source[u]
  • cast(source[B], t) -> source[t] # e.g. for guessing a format type from the filename extension
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment