Skip to content

Instantly share code, notes, and snippets.

@pwin
Last active May 2, 2019 08:46
Show Gist options
  • Save pwin/027b637ce0b278a65ad655783a9b1cdb to your computer and use it in GitHub Desktop.
Save pwin/027b637ce0b278a65ad655783a9b1cdb to your computer and use it in GitHub Desktop.
A script of an introduction to SPARQL and http://statistics.gov.scot for the ProductForge Hack event - 26 Jan 2018

ProductForge 26 January 2018

https://goo.gl/WPeY5r

General resources:

DEMO

1: Go to http://yasgui.org

2: Paste http://statistics.gov.scot/sparql into the endpoint box

3: let's see if there are any triples there (Basic Graph Pattern - BGP)

    ask 
    {?s ?p ?o}

4: is there at least one named graph?

ask {
graph ?g {?s ?p ?o}
}

5: How do we know it isn't just returning True all the time?

ask {
graph ?g 
{?s <http://something.org/text> ?o}
}

6: Lets look at some graphs

select distinct ?g
{graph ?g
{?s ?p ?o }
}
limit 200

6: How many graphs are there?

select (count(distinct ?g) as ?numberOfGraphs)
{graph ?g
{?s ?p ?o }
}

6a: What Classes of "thing" are there?

select distinct ?o
{graph ?g
{?s a ?o }
}

6b: the 'a' in the above is a shorthand for rdf:type

So we could have looked for the Classes of 'thing' in the following way

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select distinct ?s
{graph ?g
{?s rdf:type rdfs:Class }
}

7: Having spotted what looks like an interesting graph earlier, (just from the URI) let's rummage around

SELECT * WHERE {
  graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
  ?sub ?pred ?obj .
}
}
LIMIT 10

7a: noticing the 'cube' predicates, let's explore http://purl.org/linked-data/cube#structure

7b: let's look to see if there is a qb:DataSet

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qb: <http://purl.org/linked-data/cube#>
select distinct ?s
{graph ?g
{?s rdf:type qb:DataSet }
}

8: Just speculatively picking a 'structure' URI, we can describe it

describe <http://statistics.gov.scot/data/structure/p1-BMI-clinical>

9: ...we can grub around more using 'describe'

describe <http://statistics.gov.scot/def/code-list/p1-BMI-clinical/weightCategory>

10: let's explore this 'code list' to see what it is about

SELECT * WHERE {
  ?sub ?pred <http://statistics.gov.scot/def/concept/weight-category/clinical-overweight> .
}
LIMIT 10
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?obj ?objLabel WHERE {
  <http://statistics.gov.scot/def/concept-scheme/weight-category> ?pred ?obj ;.
   ?obj    rdfs:label | dc:title ?objLabel .
}
LIMIT 100

11: So, back to the query earlier

SELECT * WHERE {
  graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
  ?sub ?pred ?obj .
}
}
LIMIT 10

we can see several potentially interesting cube components for http://statistics.gov.scot/data/structure/p1-BMI-clinical

12: Find some observations

PREFIX qb: <http://purl.org/linked-data/cube#>
SELECT * WHERE {
  graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
  ?s a <http://purl.org/linked-data/cube#Observation> ;.
}
}
LIMIT 10

13: describe one

describe <http://statistics.gov.scot/data/p1-BMI-clinical/government-year/2007-2008/S08000015/gender/female/weight-category/clinical-underweight/percent-of-p1-children/ratio>

Then pull the description to create a more general query

<http://statistics.gov.scot/data/p1-BMI-clinical/government-year/2007-2008/S08000015/gender/female/weight-category/clinical-underweight/percent-of-p1-children/ratio> a <http://purl.org/linked-data/cube#Observation> ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refArea> <http://statistics.gov.scot/id/statistical-geography/S08000015> ;
	<http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure> <http://statistics.gov.scot/def/concept/measure-units/percent-of-p1-children> ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> <http://reference.data.gov.uk/id/government-year/2007-2008> ;
	<http://purl.org/linked-data/cube#measureType> <http://statistics.gov.scot/def/measure-properties/ratio> ;
	<http://purl.org/linked-data/cube#dataSet> <http://statistics.gov.scot/data/p1-BMI-clinical> ;
	<http://statistics.gov.scot/def/dimension/gender> <http://statistics.gov.scot/def/concept/gender/female> ;
	<http://statistics.gov.scot/def/measure-properties/ratio> 6.3E-1 ;
	<http://statistics.gov.scot/def/dimension/weightCategory> <http://statistics.gov.scot/def/concept/weight-category/clinical-underweight> .

Replace the entity URIs for the objects above with variables, and stick a select verb around it to create a query

select * {
  graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
?obs a <http://purl.org/linked-data/cube#Observation> ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?refArea ;
	<http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure> ?uom ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?refPeriod ;
	<http://purl.org/linked-data/cube#measureType> ?measureType ;
	<http://purl.org/linked-data/cube#dataSet> ?dataSet ;
	<http://statistics.gov.scot/def/dimension/gender> ?gender ;
	<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
	<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .
  }
  }
limit 10

Now we need some labels

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select * {
  graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
?obs a <http://purl.org/linked-data/cube#Observation> ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?refArea ;
	<http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure> ?uom ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?refPeriod ;
	<http://purl.org/linked-data/cube#measureType> ?measureType ;
	<http://purl.org/linked-data/cube#dataSet> ?dataSet ;
	<http://statistics.gov.scot/def/dimension/gender> ?gender ;
	<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
	<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .
    ?refArea rdfs:label ?refAreaLabel.
    ?refPeriod rdfs:label ?refPeriodLabel .
    ?gender rdfs:label ?genderLabel.
}
}
limit 10

But this doesn't return anything. Spoiler-alert: The labels are somewhere else - different graph/s

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?g ?g1 ?g2 ?g3 ?g4 ?refArea ?refAreaLabel ?refPeriodLabel ?genderLabel ?weightCategoryLabel ?ratio  WHERE {
  graph <http://statistics.gov.scot/graph/p1-BMI-clinical>{
?obs a <http://purl.org/linked-data/cube#Observation> ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?refArea ;
	<http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure> ?uom ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?refPeriod ;
	<http://purl.org/linked-data/cube#measureType> ?measureType ;
	<http://purl.org/linked-data/cube#dataSet> ?dataSet ;
	<http://statistics.gov.scot/def/dimension/gender> ?gender ;
	<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
	<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .

}
  graph ?g {
    ?refArea rdfs:label ?refAreaLabel.
      }
  graph ?g1 {
    ?refPeriod rdfs:label ?refPeriodLabel .
      }
  graph ?g2 {
    ?refPeriod rdfs:label ?refPeriodLabel .
      }
  graph ?g3 {
	?gender rdfs:label ?genderLabel.
      }
  graph ?g4 {
	?weightCategory  rdfs:label ?weightCategoryLabel .
      }
}
LIMIT 10

We can in this case remove the specification for individual named graphs and use the "Union Graph" - (think sets) Also, instead of everything we just return the labels

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?refAreaLabel ?refPeriodLabel ?genderLabel ?weightCategoryLabel ?ratio {
?obs a <http://purl.org/linked-data/cube#Observation> ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?refArea ;
	<http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure> ?uom ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?refPeriod ;
	<http://purl.org/linked-data/cube#measureType> ?measureType ;
	<http://purl.org/linked-data/cube#dataSet> ?dataSet ;
	<http://statistics.gov.scot/def/dimension/gender> ?gender ;
	<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
	<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .
    ?refArea rdfs:label ?refAreaLabel.
    ?refPeriod rdfs:label ?refPeriodLabel .
    ?gender rdfs:label ?genderLabel.
	?weightCategory  rdfs:label ?weightCategoryLabel .
}
#limit 10

From this we can use the yasgui embedded pivot table or charts to create some views on the results. We can use FILTER() conditions to focus the query better

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?refAreaLabel ?refPeriodLabel ?genderLabel ?weightCategoryLabel ?ratio {
?obs a <http://purl.org/linked-data/cube#Observation> ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?refArea ;
	<http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure> ?uom ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?refPeriod ;
	<http://purl.org/linked-data/cube#measureType> ?measureType ;
	<http://purl.org/linked-data/cube#dataSet> ?dataSet ;
	<http://statistics.gov.scot/def/dimension/gender> ?gender ;
	<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
	<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .
    ?refArea rdfs:label ?refAreaLabel.
    ?refPeriod rdfs:label ?refPeriodLabel .
    ?gender rdfs:label ?genderLabel.
	?weightCategory  rdfs:label ?weightCategoryLabel .
  FILTER(lcase(?refAreaLabel) = "scotland" && contains(?weightCategoryLabel,"Clinical - Healthy")  &&  ?genderLabel = "Male")
}

Once we have a suitable query we need to think about how to use this in an application. We can use tools like Postman to help.

Using Postman to test, the "&&" cause problems, so use several FILTER() statements

Accept=application/SPARQL-Results+xml
query=PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select ?refAreaLabel ?refPeriodLabel ?genderLabel ?weightCategoryLabel ?ratio {
?obs a <http://purl.org/linked-data/cube#Observation> ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refArea> ?refArea ;
	<http://purl.org/linked-data/sdmx/2009/attribute#unitMeasure> ?uom ;
	<http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?refPeriod ;
	<http://purl.org/linked-data/cube#measureType> ?measureType ;
	<http://purl.org/linked-data/cube#dataSet> ?dataSet ;
	<http://statistics.gov.scot/def/dimension/gender> ?gender ;
	<http://statistics.gov.scot/def/measure-properties/ratio> ?ratio ;
	<http://statistics.gov.scot/def/dimension/weightCategory> ?weightCategory .
    ?refArea rdfs:label ?refAreaLabel.
    ?refPeriod rdfs:label ?refPeriodLabel .
    ?gender rdfs:label ?genderLabel.
	?weightCategory  rdfs:label ?weightCategoryLabel .
   FILTER(lcase(?refAreaLabel) = "scotland")
   FILTER(contains(?weightCategoryLabel,"Clinical - Healthy"))
   FILTER(?genderLabel = "Male")
  }
  order by ?refPeriodLabel
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment