Skip to content

Instantly share code, notes, and snippets.

@p1d1d1
Last active March 7, 2022 21:57
Show Gist options
  • Save p1d1d1/9bcea6d5ce29a70f0e7d2baff1ec8ad9 to your computer and use it in GitHub Desktop.
Save p1d1d1/9bcea6d5ce29a70f0e7d2baff1ec8ad9 to your computer and use it in GitHub Desktop.

GeoSPARQL support: Triple Stores comparison

I have been planning since months to put in a readable form some tests I did to check how Triple Stores deal with GeoSPARQL. Then a twitter discussion has eventually given me the motivation to start these notes. The word "benchmarking" I used in the mentioned twitter discussion is actually not the right one. It is not my intention to measure performance (how quick is the store to load data, how many trillions triples are managed, how fast is the response to a federated query over tens services): my main interest here is simply to load some geodata and run a basic GeoSPARQL query and see how the stores behave. I decided to use docker and run the images with the default configurations available in Docker Hub: sometimes I have quikly further investigated in case of failure.

Dataset

For the tests I used a simple dataset containing the 26 Swiss Cantons and serialized is by using the GeoSPARQL vocabulary. I ended up with the following three files:

  • cantons84.nt: the 26 Swiss Cantons as WKT. Coordinate reference system is CRS84, but it is not encoded in the data
  • cantons84CRS.nt: same as the file before, but with CRS encoded in the data
  • cantons95.nt: the 26 Swiss Cantons as WKT and EPSG:2056. CRS not encoded in the data

The spatial query

A basic spatial query: what is at a given point? For the test I used the first above mentioned file and the point (7.13 46.47). The GeoSPARQL query will be:

PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT DISTINCT * 
WHERE {
    ?s a geo:Geometry.
    ?s geo:asWKT ?Geom.
    FILTER (geof:sfContains(?Geom, "Point(7.13 46.47))"^^geo:wktLiteral))}

and it should return only one resource: at a given point there is only one Canton.

Results

Virtuoso Open Source

Loading and querying data

  • cantons84.nt: data is loaded, but the type literal is changed from geosparql#wktLiteral to virtrdf#Geometry (which is a Virtuoso built-in type literal for geometries
  • cantons84CRS.nt: data is not loaded. An invalid format error is returned
  • cantons95.nt: data is not loaded. An invalid format error is returned
  • Query: it does not run (see Remarks)

Remarks

Virtuoso has built-in geospatial functions, which can be more or less easily mapped to the GeoSPARQL ones. The query should be modified as follows:

PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT DISTINCT * 
WHERE {
    ?s a geo:Geometry.
    ?s geo:asWKT ?Geom.
    FILTER (bif:st_contains (?Coords, bif:st_point (7.13, 46.47)))}

The query returns 4 objects and this is wrong (see this issue live here). I assume that geospatial queries in Virtuoso are based on the bounding box of the geometry instead of the real geometry. As more and more geospatial functionalities will be required and used in the future, this issue is quite crucial.

Update

The issue with the spatial query has been fixed: https://twitter.com/kidehen/status/936279887744241666.

Let's hope this gives us the chance to have some other related issues fixed.

GeoSPARQL support in version develop/7: https://github.com/openlink/virtuoso-opensource/blob/develop/7/README.GeoSPARQL.md To be tested.


Parliament™

Loading and querying data

  • cantons84.nt: data is loaded
  • cantons84CRS.nt: data is loaded
  • cantons95.nt: data is loaded
  • Query: before running the query I had to index the data: this is just a couple of clicks. The query then runs and returns 1 object, as expected.

Jena Fuseki

Loading and querying data

  • cantons84.nt: data is loaded
  • cantons84CRS.nt: data is loaded
  • cantons95.nt: data is loaded
  • Query: it does not run

Remarks

Fuseki has dedicated type literals for coordinates and built-in geospatial functions. I have not been able to use any of these spatial functions for the use case of the test: I assume that the WKT serialization is not recognised as geometry, so that spatial functions have no effect of the data.


Blazegraph

Loading and querying data

  • cantons84.nt: data is loaded
  • cantons84CRS.nt: data is loaded
  • cantons95.nt: data is loaded
  • Query: it does not run

Remarks

See Jena Fuseki


GraphDB

Loading and querying data

  • cantons84.nt: data is loaded
  • cantons84CRS.nt: data is loaded
  • cantons95.nt: data is loaded
  • Query: it runs and returns 1 object, as expected

Stardog

  • Homepage: http://stardog.com/
  • Docker image: bluepeppers/stardog
  • Software version: Stardog 5 Enterprise (30-day trial)

Loading and querying data

  • cantons84.nt: (see Remarks)
  • cantons84CRS.nt: (see Remarks)
  • cantons95.nt: (see Remarks)
  • Query: it does not run (see Remarks)

Remarks

The docker image runs the Stardog community edition and there is no spatial support in the community edition. I decided to try with the Enterprise edition (30-trial), configured as explained here. I got bored in trying to upload the data: very often I receive an invalid format error, sometimes data is loaded but you wait minutes (last test with cantons84CRS.nt took 8 minutes). Then the query does not run. Having a look at the documentation I discovered that Stardog supports very few GeoSPARQL functions (and sf:Contains is not one of those). The functions do not follow the GeoSPARQL standard (e.g. they use within instead of sfWithin). The way these functions are used in the queries as well does not follows IMHO the standard.

Update

I have tried again on a different Linux machine. Doing:

  • create a spatially enabled DB
  • load cantons84.nt: milliseconds
  • load cantons84CRS.nt: still needs couple of minutes

I stop here. In any case the GeoSPARQL support is IMHO not enough.


Strabon

TBD


Apache Marmotta

Loading and querying data

  • cantons84.nt: data is loaded
  • cantons84CRS.nt: data is loaded
  • cantons95.nt: data is loaded
  • Query: it does not run. I assume the postgresql database in the Docker image is not "postgis-enabled"

Update

I have tried again on a different Linux machine. Doing:

  • docker pull/run kartoza/postgis (this provides a postgis-enabled database)
  • install Marmotta via the Installer
  • load cantons84CRS.nt

I am not able to run the query. Then I added the point as resource into the store and tried with:

PREFIX geo: <http://www.opengis.net/ont/geosparql#>
PREFIX geof: <http://www.opengis.net/def/function/geosparql>

SELECT DISTINCT * 
WHERE {
    ?s a geo:Geometry.
    ?s geo:asWKT ?Geom.
    ?s2 a geo:Geometry; rdfs:label "point".
    ?s2 geo:asWKT ?Geom2.
    FILTER (geof:sfContains (?Geom, ?Geom2))

It returns "error while evaluating the query". I have the impression I am missing something and/or doing something wrong... dunno. By the way, last version is from 2014-12-05.

@p1d1d1
Copy link
Author

p1d1d1 commented Sep 19, 2018

cantons95.nt: the 26 Swiss Cantons as WKT and EPSG:2056. CRS not encoded in the data

This does not work, obviously, as GeoSPARQL assumes 4326 longlat and values are out of thresholds for latitudes and longitudes. SRID should be in data strings there.

Correct, but did it just for test purposes. Strange that it gets loaded in some stores, e.g. Parliament, but then not sure about spatial queries: didn't test it.

@tom-ch1
Copy link

tom-ch1 commented Mar 19, 2021

currently, the issue that geospatial queries in Virtuoso are based on the bounding box of the geometry instead of the real geometry is present once again:
There is a very silly workaround, converting the geometry to text and parse the text into a geometry (eg. bif:st_geomfromtext(bif:st_astext(?Stopwkt)). Here are links which demonstrate Bug and Workaround:

@situx
Copy link

situx commented Mar 6, 2022

Hi!
I just stumbled across this gist in a presentation shared online and if you are not aware of it, you might be interested in the GeoSPARQL Compliance benchmark we developed last year:
https://doi.org/10.3390/ijgi10070487
The tests are fully documented in the paper, the results are fully reproducible using the HOBBIT benchmarking platform and the results were generated last year.
Would be awesome to get feedback on the test queries we use and if you would use different ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment