SPARQL CONSTRUCT comparison

I had some days left on a physical machine we used for an EU FP7 research project so I took the chance to compare 3 triplestores (update: added some more based on comments here) I or my colleagues worked with in the past months. I do not want to imply anything with this test, it's just me playing around and having fun with RDF. If you have any comments, add it here.

Hardware

The test platform comprises a dedicated server, not a virtual machine, with the following specification:

2 x Intel Xeon E5 2620V2, 2 x (6 x 2.10 GHz) (appears as 24 cores in htop)
128 GB buffered ECC RAM
1000 GB SSD (Samsung 840 EVO)
Ubuntu 14.04

Dataset

The dataset contains 5 million triples (including some which are not valid RDF as "NA" is declared as xsd:int). It contains transports between entities and a date. To optimize query execution time for the particular use case, we want to infer/materialize (what's the right word here?) some triples so we don't have to go through all data all the time.

Source: (http://ktk.netlabs.org/misc/bfs/blv.nt) (622MB)

@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix pobo: <http://purl.obolibrary.org/obo/> .


<http://foodsafety.data.admin.ch/move/0> a schema:TransferAction ;
  schema:fromLocation <http://foodsafety.data.admin.ch/business/50454> ;
  schema:toLocation <http://foodsafety.data.admin.ch/business/50415> ;
  dc:date "2012-01-01"^^xsd:date ;
  pobo:UO_0000189 "1"^^xsd:int .

There are around 900'000 TransferAction in there. We torture the server with the following CONSTRUCT (well, INSERT) query:

PREFIX blv: <http://blv.ch/>
PREFIX schema: <http://schema.org/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

INSERT {
    ?othermove blv:notBefore ?move .
}
WHERE {

    ?move a schema:TransferAction ;
    dc:date ?date ;
    schema:toLocation ?toFarm .

    ?othermove a schema:TransferAction ;
    dc:date ?otherdate ;
    schema:fromLocation ?toFarm .

    FILTER (?date <= ?otherdate)

}

After successful execution, I check how many triples were generated:

SELECT  (COUNT(*) AS ?c) WHERE {?s <http://blv.ch/notBefore> ?o}

Which should be around 30 million triples.

Results

Note that I did not do any optimization on the configurations. My idea was to take what vendors ship by default and see how long it takes. Because that's what users usually do ;)

Virtuoso

Homepage: http://virtuoso.openlinksw.com/
Version: Virtuoso version 07.20.3215 on Linux (x86_64-unknown-linux-gnu), Single Server Edition
Host: docker, image tenforce/virtuoso
Query execution time: 23 minutes

Remarks

Loading RDF was fast, did it with iSQL according to the documentation of the Docker image. Virtuoso does not seem to use more than one core. During the whole execution time I had 100% load on one of the 24 cores, the rest did nothing.

Stardog

Homepage: http://stardog.com/
Version: 4.0.5, Enterprise license (1 month trial key)
Host: docker, image java:latest as there is no public docker image available.
Run: Default configuration started with stardog-admin server start
Query execution time: 4.00 minutes

Remarks

Loading was fast, did it with stardog data add on command line. I had the impression there is some query optimization going on. In the beginning there was not too much activity on the different cores. After a while the box became more busy and I saw quite some load on all cores. By far the fastest query execution time.

Blazegraph

Homepage: https://www.blazegraph.com/
Version: 2.1.0
Host: docker, image java:latest as there is no public docker image available.
Run: java -server -Xmx8g -jar blazegraph.jar
Query execution time: 33 minutes

Remarks

I first used a docker image but didn't notice that this was the old 1.x version. I ran into a bug while executing the query on a 24 core machine and they asked me to retry with 2.x so make sure you use this as well as all docker images seem to be 1.x. Loading was fast, loaded it in the SPARQL UPDATE web interface from URI. Blazegraph was the most active on all cores, I basically had the whole time quite some load on them. I tried as well with 64GB or memory allocated to the VM but that was apparently not a bottleneck.

Jena Fuseki

Homepage: https://jena.apache.org/documentation/serving_data/
Version: Version 2.0.1-SNAPSHOT
Host: docker, image stain/jena-fuseki
Query execution time: TODO minutes

Remarks

I started the docker image and loaded the data with tdbloader into /fuseki/databases/blv. After that I created a new database in the web interface which apparently didn't override the TDB store. The loading time is fast. While executing the query there is high load on all cores.

UPDATE 27.4.2016: I increased -xmx to 8GB and after around 6 hours I ran out of heap space. Not sure if we get anywhere without optimizing it (and I don't really know how).

Ontotext GraphDB

Homepage: http://ontotext.com/products/graphdb/
Version: GraphDB Free 7.0
Host: docker, image java:latest as there is no public docker image available.
Run: ~/graphdb-free-7.0.0/bin# ./graphdb
Query execution time: 16 minutes

Remarks

I created a new default store configuration, didn't change anything on the default settings regarding cache size etc. Loading via URL, loading was fast. I see load only on one core.

Ontos OntoQuad

Homepage: http://www.ontos.com/products/ontoquad/
Version: 0.6.0
Host: docker, built from Dockerfile found in ontoquad-docker.txz
Query execution time: 31 minutes (default config, polymorphic2)
Query execution time: 14 minutes (polymorphic2, no transaction)

Remarks

After consulting the documentation in Confluence I managed to upload the file as Triples which I copied into the docker image. Loading is fast. Default query execution timeout was too low, I could change it in the webinterface but I think it never got stored for some reason. So I changed it in the config file itself before I built the docker image. Same problem with transactions, disabled it in the config for the second round.

ktk/SPARQL-tests.md

SPARQL CONSTRUCT comparison

Hardware

Dataset

Results

Virtuoso

Remarks

Stardog

Remarks

Blazegraph

Remarks

Jena Fuseki

Remarks

Ontotext GraphDB

Remarks

Ontos OntoQuad

Remarks

beebs-systap commented May 3, 2016

Uh oh!

ktk commented May 6, 2016

Uh oh!