Skip to content

Instantly share code, notes, and snippets.

@ktk
Last active May 6, 2016 17:36
Show Gist options
  • Save ktk/a04e267dd776da2511692e96fc2b5d99 to your computer and use it in GitHub Desktop.
Save ktk/a04e267dd776da2511692e96fc2b5d99 to your computer and use it in GitHub Desktop.
SPARQL performance tests

SPARQL CONSTRUCT comparison

I had some days left on a physical machine we used for an EU FP7 research project so I took the chance to compare 3 triplestores (update: added some more based on comments here) I or my colleagues worked with in the past months. I do not want to imply anything with this test, it's just me playing around and having fun with RDF. If you have any comments, add it here.

Hardware

The test platform comprises a dedicated server, not a virtual machine, with the following specification:

  • 2 x Intel Xeon E5 2620V2, 2 x (6 x 2.10 GHz) (appears as 24 cores in htop)
  • 128 GB buffered ECC RAM
  • 1000 GB SSD (Samsung 840 EVO)
  • Ubuntu 14.04

Dataset

The dataset contains 5 million triples (including some which are not valid RDF as "NA" is declared as xsd:int). It contains transports between entities and a date. To optimize query execution time for the particular use case, we want to infer/materialize (what's the right word here?) some triples so we don't have to go through all data all the time.

Source: (http://ktk.netlabs.org/misc/bfs/blv.nt) (622MB)

@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix pobo: <http://purl.obolibrary.org/obo/> .


<http://foodsafety.data.admin.ch/move/0> a schema:TransferAction ;
  schema:fromLocation <http://foodsafety.data.admin.ch/business/50454> ;
  schema:toLocation <http://foodsafety.data.admin.ch/business/50415> ;
  dc:date "2012-01-01"^^xsd:date ;
  pobo:UO_0000189 "1"^^xsd:int .

There are around 900'000 TransferAction in there. We torture the server with the following CONSTRUCT (well, INSERT) query:

PREFIX blv: <http://blv.ch/>
PREFIX schema: <http://schema.org/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

INSERT {
    ?othermove blv:notBefore ?move .
}
WHERE {

    ?move a schema:TransferAction ;
    dc:date ?date ;
    schema:toLocation ?toFarm .

    ?othermove a schema:TransferAction ;
    dc:date ?otherdate ;
    schema:fromLocation ?toFarm .

    FILTER (?date <= ?otherdate)

} 

After successful execution, I check how many triples were generated:

SELECT  (COUNT(*) AS ?c) WHERE {?s <http://blv.ch/notBefore> ?o}

Which should be around 30 million triples.

Results

Note that I did not do any optimization on the configurations. My idea was to take what vendors ship by default and see how long it takes. Because that's what users usually do ;)

Virtuoso

  • Homepage: http://virtuoso.openlinksw.com/
  • Version: Virtuoso version 07.20.3215 on Linux (x86_64-unknown-linux-gnu), Single Server Edition
  • Host: docker, image tenforce/virtuoso
  • Query execution time: 23 minutes

Remarks

Loading RDF was fast, did it with iSQL according to the documentation of the Docker image. Virtuoso does not seem to use more than one core. During the whole execution time I had 100% load on one of the 24 cores, the rest did nothing.

Stardog

  • Homepage: http://stardog.com/
  • Version: 4.0.5, Enterprise license (1 month trial key)
  • Host: docker, image java:latest as there is no public docker image available.
  • Run: Default configuration started with stardog-admin server start
  • Query execution time: 4.00 minutes

Remarks

Loading was fast, did it with stardog data add on command line. I had the impression there is some query optimization going on. In the beginning there was not too much activity on the different cores. After a while the box became more busy and I saw quite some load on all cores. By far the fastest query execution time.

Blazegraph

  • Homepage: https://www.blazegraph.com/
  • Version: 2.1.0
  • Host: docker, image java:latest as there is no public docker image available.
  • Run: java -server -Xmx8g -jar blazegraph.jar
  • Query execution time: 33 minutes

Remarks

I first used a docker image but didn't notice that this was the old 1.x version. I ran into a bug while executing the query on a 24 core machine and they asked me to retry with 2.x so make sure you use this as well as all docker images seem to be 1.x. Loading was fast, loaded it in the SPARQL UPDATE web interface from URI. Blazegraph was the most active on all cores, I basically had the whole time quite some load on them. I tried as well with 64GB or memory allocated to the VM but that was apparently not a bottleneck.

Jena Fuseki

Remarks

I started the docker image and loaded the data with tdbloader into /fuseki/databases/blv. After that I created a new database in the web interface which apparently didn't override the TDB store. The loading time is fast. While executing the query there is high load on all cores.

UPDATE 27.4.2016: I increased -xmx to 8GB and after around 6 hours I ran out of heap space. Not sure if we get anywhere without optimizing it (and I don't really know how).

Ontotext GraphDB

  • Homepage: http://ontotext.com/products/graphdb/
  • Version: GraphDB Free 7.0
  • Host: docker, image java:latest as there is no public docker image available.
  • Run: ~/graphdb-free-7.0.0/bin# ./graphdb
  • Query execution time: 16 minutes

Remarks

I created a new default store configuration, didn't change anything on the default settings regarding cache size etc. Loading via URL, loading was fast. I see load only on one core.

Ontos OntoQuad

  • Homepage: http://www.ontos.com/products/ontoquad/
  • Version: 0.6.0
  • Host: docker, built from Dockerfile found in ontoquad-docker.txz
  • Query execution time: 31 minutes (default config, polymorphic2)
  • Query execution time: 14 minutes (polymorphic2, no transaction)

Remarks

After consulting the documentation in Confluence I managed to upload the file as Triples which I copied into the docker image. Loading is fast. Default query execution timeout was too low, I could change it in the webinterface but I think it never got stored for some reason. So I changed it in the config file itself before I built the docker image. Same problem with transactions, disabled it in the config for the second round.

@ktk
Copy link
Author

ktk commented Apr 27, 2016

@serwan
Copy link

serwan commented Apr 27, 2016

Hi and thanks for the bench, for the GC overhead problem , the -XX:+UseG1GC java option could help (http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html).

@ktk
Copy link
Author

ktk commented Apr 27, 2016

@serwan tnx I did some changes and it's running for quite a while now. Will post once I manage to get some results

@martin-voigt
Copy link

martin-voigt commented Apr 27, 2016

Hi, interesting. Since the data is from/for LINDAS, we are also interested. We tested with our QUAD store with default config in a Centos 7 VirtualBox

Hardware:
Intel Core i7-6700 3.4GHz
20GB RAM

Results:
index-type polymorphic2, transactions off
Query execution time: 5min 30sec

index-type polymorphic2, transactions on
Query execution time: 13min 48sec

index-type polymorphic6monolith, transactions off
Query execution time: 12min 44sec

index-type polymorphic6monolith, transactions on
Query execution time: 1hour 16min 25 sec

The query time heavily depend on the index time and if transaction mode is on.

@ktk
Copy link
Author

ktk commented Apr 27, 2016

@martin-voigt I was about to register for your store, can I get a docker to test on my hardware?

@beebs-systap
Copy link

We took a look on our side and saw similar results for Blazegraph. We did have some optimizations in the query that ran much faster, but opened a ticket at BLZG-1902.

@ktk
Copy link
Author

ktk commented May 6, 2016

@beebs-systap thanks for the issue, interesting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment