I have been planning since months to put in a readable form some tests I did to check how Triple Stores deal with GeoSPARQL. Then a twitter discussion has eventually given me the motivation to start these notes. The word "benchmarking" I used in the mentioned twitter discussion is actually not the right one. It is not my intention to measure performance (how quick is the store to load data, how many trillions triples are managed, how fast is the response to a federated query over tens services): my main interest here is simply to load some geodata and run a basic GeoSPARQL query and see how the stores behave. I decided to use docker and run the images with the default configurations available in Docker Hub: sometimes I have quikly further investigated in case of failure.
For the tests I used a simple dataset containing the 26 Swiss Cantons and serialized is by using the GeoSPARQL vocabulary. I ended up with the following three files:
- cantons84.nt: the 26 Swiss Cantons as WKT. Coordinate reference system is CRS84, but it is not encoded in the data
- cantons84CRS.nt: same as the file before, but with CRS encoded in the data
- cantons95.nt: the 26 Swiss Cantons as WKT and EPSG:2056. CRS not encoded in the data
A basic spatial query: what is at a given point? For the test I used the first above mentioned file and the point (7.13 46.47). The GeoSPARQL query will be:
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT DISTINCT *
WHERE {
?s a geo:Geometry.
?s geo:asWKT ?Geom.
FILTER (geof:sfContains(?Geom, "Point(7.13 46.47))"^^geo:wktLiteral))}
and it should return only one resource: at a given point there is only one Canton.
- Homepage: http://vos.openlinksw.com/owiki/wiki/VOS
- Docker image:
tenforce/virtuoso
- cantons84.nt: data is loaded, but the type literal is changed from geosparql#wktLiteral to virtrdf#Geometry (which is a Virtuoso built-in type literal for geometries
- cantons84CRS.nt: data is not loaded. An invalid format error is returned
- cantons95.nt: data is not loaded. An invalid format error is returned
- Query: it does not run (see Remarks)
Virtuoso has built-in geospatial functions, which can be more or less easily mapped to the GeoSPARQL ones. The query should be modified as follows:
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>
SELECT DISTINCT *
WHERE {
?s a geo:Geometry.
?s geo:asWKT ?Geom.
FILTER (bif:st_contains (?Coords, bif:st_point (7.13, 46.47)))}
The query returns 4 objects and this is wrong (see this issue live here). I assume that geospatial queries in Virtuoso are based on the bounding box of the geometry instead of the real geometry. As more and more geospatial functionalities will be required and used in the future, this issue is quite crucial.
The issue with the spatial query has been fixed: https://twitter.com/kidehen/status/936279887744241666.
Let's hope this gives us the chance to have some other related issues fixed.
GeoSPARQL support in version develop/7: https://github.com/openlink/virtuoso-opensource/blob/develop/7/README.GeoSPARQL.md To be tested.
- Homepage: http://parliament.semwebcentral.org/
- Docker image:
daxid/parliament-triplestore
- cantons84.nt: data is loaded
- cantons84CRS.nt: data is loaded
- cantons95.nt: data is loaded
- Query: before running the query I had to index the data: this is just a couple of clicks. The query then runs and returns 1 object, as expected.
- Homepage: https://jena.apache.org/documentation/serving_data/
- Docker image:
stain/jena-fuseki
- cantons84.nt: data is loaded
- cantons84CRS.nt: data is loaded
- cantons95.nt: data is loaded
- Query: it does not run
Fuseki has dedicated type literals for coordinates and built-in geospatial functions. I have not been able to use any of these spatial functions for the use case of the test: I assume that the WKT serialization is not recognised as geometry, so that spatial functions have no effect of the data.
- Homepage: https://www.blazegraph.com/
- Docker image:
lyrasis/blazegraph
- cantons84.nt: data is loaded
- cantons84CRS.nt: data is loaded
- cantons95.nt: data is loaded
- Query: it does not run
See Jena Fuseki
- Homepage: http://graphdb.ontotext.com/
- Docker image:
ontotext/graphdb:se
- cantons84.nt: data is loaded
- cantons84CRS.nt: data is loaded
- cantons95.nt: data is loaded
- Query: it runs and returns 1 object, as expected
- Homepage: http://stardog.com/
- Docker image:
bluepeppers/stardog
- Software version: Stardog 5 Enterprise (30-day trial)
- cantons84.nt: (see Remarks)
- cantons84CRS.nt: (see Remarks)
- cantons95.nt: (see Remarks)
- Query: it does not run (see Remarks)
The docker image runs the Stardog community edition and there is no spatial support in the community edition. I decided to try with the Enterprise edition (30-trial), configured as explained here. I got bored in trying to upload the data: very often I receive an invalid format error, sometimes data is loaded but you wait minutes (last test with cantons84CRS.nt took 8 minutes). Then the query does not run. Having a look at the documentation I discovered that Stardog supports very few GeoSPARQL functions (and sf:Contains is not one of those). The functions do not follow the GeoSPARQL standard (e.g. they use within instead of sfWithin). The way these functions are used in the queries as well does not follows IMHO the standard.
I have tried again on a different Linux machine. Doing:
- create a spatially enabled DB
- load cantons84.nt: milliseconds
- load cantons84CRS.nt: still needs couple of minutes
I stop here. In any case the GeoSPARQL support is IMHO not enough.
- Homepage: http://www.strabon.di.uoa.gr/
TBD
- Homepage: http://marmotta.apache.org/
- Docker image:
apache/marmotta
- cantons84.nt: data is loaded
- cantons84CRS.nt: data is loaded
- cantons95.nt: data is loaded
- Query: it does not run. I assume the postgresql database in the Docker image is not "postgis-enabled"
I have tried again on a different Linux machine. Doing:
docker pull/run kartoza/postgis
(this provides a postgis-enabled database)- install Marmotta via the Installer
- load cantons84CRS.nt
I am not able to run the query. Then I added the point as resource into the store and tried with:
PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql>
SELECT DISTINCT *
WHERE {
?s a geo:Geometry.
?s geo:asWKT ?Geom.
?s2 a geo:Geometry; rdfs:label "point".
?s2 geo:asWKT ?Geom2.
FILTER (geof:sfContains (?Geom, ?Geom2))
It returns "error while evaluating the query". I have the impression I am missing something and/or doing something wrong... dunno. By the way, last version is from 2014-12-05.
This does not work, obviously, as GeoSPARQL assumes 4326 longlat and values are out of thresholds for latitudes and longitudes. SRID should be in data strings there.