Last active August 7, 2022 00:28
SPARQL strawman for functions and operators over lists

I took a few hours to hack up a proof-of-concept for this SPARQL use-case:

Another related example. Given these column values

  • project: 123
  • PI_IDS: 1858722 (contact); 1883064; 3150248;

I want it to converted to this

<project/123> principalInvestigatorContact <researcher/1858722>;
  principalInvestigator <researcher/1883064>, <researcher/3150248>;

<researcher/1858722> a Researcher; name "BUCK, JOCHEN".
<researcher/1883064> a Researcher; name "LEVIN, LONNY R".
<researcher/3150248> a Researcher; name "VISCONTI, PABLO E.".

In issue 64, I said this:

I'll also note that this sort of support for composite types at the query level could impact discussions around other tickets like #6 (where, for example, multi-binding-producing functions are re-framed as single-composite-value-producing functions, and we add operators to construct, iterate over, and unpack from these values).

Using that as a jumping-off point, I extended Attean with:

  • New extension functions operating over literals with datatype ex:List:
    • ex:split(xsd:string, xsd:string) -> ex:List
    • ex:zip(ex:List, ex:List) -> ex:List
    • ex:listGet(ex:List, xsd:integer) -> RDFTerm
  • A new EXPLODE operator which syntactically mirrors BIND, but which produces any number of results
    • EXPLODE(expr AS ?var); expr evaluating to ex:List, produces one result for each element of the encoded list

With these changes, this query constructs the desired results:

PREFIX ex: <>
	?project ex:principalInvestigatorContact ?piContact ;
	  ex::principalInvestigator ?pi .

	?researcher a ex:Researcher ;
		ex:name ?name .
	# Original data
	VALUES (?project_id ?ids ?names) {
			"1858722 (contact); 1883064; 3150248;"

	# Split names and ids into individual records, contained in a ex:List-typed
	# literal.
	BIND(ex:split(?ids, "; ") AS ?idList)
	BIND(ex:split(?names, "; ") AS ?nameList)

	# Make a single list of (name, id) pairs
	BIND(ex:zip(?nameList, ?idList) AS ?pairs)
	# Make one result per (name, id) pair
	EXPLODE(?pairs AS ?pair)
	# Extract the name and id from the pair ("with annotation" because they
	# might contain the trailing " (contact)" string)
	BIND(ex:listGet(?pair, 0) AS ?nameWithAnnotation)
	BIND(ex:listGet(?pair, 1) AS ?idWithAnnotation)
	# Strip off the " (contact)" annotation, if present
	BIND(REPLACE(?nameWithAnnotation, " [(]contact[)]", "") AS ?name)
	BIND(REPLACE(?idWithAnnotation, " [(]contact[)]", "") AS ?id)

	# Set a flag if this record is marked as the contact
	BIND(STRENDS(?idWithAnnotation, " (contact)") AS ?isContact)
	# Construct the ?researcher IRI
	BIND(URI(CONCAT("researcher/", ?id)) AS ?researcher)
	# Construct the ?project
	BIND(URI(CONCAT("project/", ?project_id)) AS ?project)
	# Using IRI() with either the bound ?researcher value or the (necessarily)
	# unbound ?undef will result in ?piContact (?pi, respectively) being bound
	# only if (not if, respectively) the ?isContact variable is true (false).
	# The `false` value will cause a type error and result in the variable
	# being unbound.
	BIND(IRI(IF(?isContact, ?researcher, ?undef)) AS ?piContact)
	BIND(IRI(IF(?isContact, ?undef, ?researcher)) AS ?pi)

As discussed in the original issue 14, I think the VALUES … OF … syntax might make this a big cleaner (instead of the use of ex:listGet), but otherwise this seems reasonable to me, if a bit verbose.

kasei commented Aug 7, 2022

