I took a few hours to hack up a proof-of-concept for this SPARQL use-case:
Another related example. Given these column values
- project: 123
- PI_IDS:
1858722 (contact); 1883064; 3150248;
- PI_NAMEs:
BUCK, JOCHEN (contact); LEVIN, LONNY R; VISCONTI, PABLO E.;
I want it to converted to this
<project/123> principalInvestigatorContact <researcher/1858722>; principalInvestigator <researcher/1883064>, <researcher/3150248>; <researcher/1858722> a Researcher; name "BUCK, JOCHEN". <researcher/1883064> a Researcher; name "LEVIN, LONNY R". <researcher/3150248> a Researcher; name "VISCONTI, PABLO E.".
In issue 64, I said this:
I'll also note that this sort of support for composite types at the query level could impact discussions around other tickets like #6 (where, for example, multi-binding-producing functions are re-framed as single-composite-value-producing functions, and we add operators to construct, iterate over, and unpack from these values).
Using that as a jumping-off point, I extended Attean with:
- New extension functions operating over literals with datatype
ex:List
:ex:split(xsd:string, xsd:string) -> ex:List
ex:zip(ex:List, ex:List) -> ex:List
ex:listGet(ex:List, xsd:integer) -> RDFTerm
- A new
EXPLODE
operator which syntactically mirrorsBIND
, but which produces any number of resultsEXPLODE(expr AS ?var)
; expr evaluating to ex:List, produces one result for each element of the encoded list
With these changes, this query constructs the desired results:
PREFIX ex: <http://example.org/>
CONSTRUCT {
?project ex:principalInvestigatorContact ?piContact ;
ex::principalInvestigator ?pi .
?researcher a ex:Researcher ;
ex:name ?name .
}
WHERE {
# Original data
VALUES (?project_id ?ids ?names) {
(
"123"
"1858722 (contact); 1883064; 3150248;"
"BUCK, JOCHEN (contact); LEVIN, LONNY R; VISCONTI, PABLO E.;"
)
}
# Split names and ids into individual records, contained in a ex:List-typed
# literal.
BIND(ex:split(?ids, "; ") AS ?idList)
BIND(ex:split(?names, "; ") AS ?nameList)
# Make a single list of (name, id) pairs
BIND(ex:zip(?nameList, ?idList) AS ?pairs)
# Make one result per (name, id) pair
EXPLODE(?pairs AS ?pair)
# Extract the name and id from the pair ("with annotation" because they
# might contain the trailing " (contact)" string)
BIND(ex:listGet(?pair, 0) AS ?nameWithAnnotation)
BIND(ex:listGet(?pair, 1) AS ?idWithAnnotation)
# Strip off the " (contact)" annotation, if present
BIND(REPLACE(?nameWithAnnotation, " [(]contact[)]", "") AS ?name)
BIND(REPLACE(?idWithAnnotation, " [(]contact[)]", "") AS ?id)
# Set a flag if this record is marked as the contact
BIND(STRENDS(?idWithAnnotation, " (contact)") AS ?isContact)
# Construct the ?researcher IRI
BIND(URI(CONCAT("researcher/", ?id)) AS ?researcher)
# Construct the ?project
BIND(URI(CONCAT("project/", ?project_id)) AS ?project)
# Using IRI() with either the bound ?researcher value or the (necessarily)
# unbound ?undef will result in ?piContact (?pi, respectively) being bound
# only if (not if, respectively) the ?isContact variable is true (false).
# The `false` value will cause a type error and result in the variable
# being unbound.
BIND(IRI(IF(?isContact, ?researcher, ?undef)) AS ?piContact)
BIND(IRI(IF(?isContact, ?undef, ?researcher)) AS ?pi)
}
As discussed in the original issue 14, I think the VALUES … OF …
syntax might make this a big cleaner (instead of the use of ex:listGet
), but otherwise this seems reasonable to me, if a bit verbose.
And here's a complete script that will demonstrate this.