Virtuoso tips and tricks

Virtuoso settings for retrieving large result sets

[Parameters]
MaxSortedTopRows                = 250000
 
[SPARQL]
ResultSetMaxRows                = 10000000
MaxQueryCostEstimationTime      = 36000    ; in seconds
MaxQueryExecutionTime           = 36000    ; in seconds

Working with constraints DBpedia's SPARQL endpoint MaxSortedTopRows Limits via LIMIT & OFFSET

The DBpedia SPARQL endpoint is configured with the following INI setting:

`MaxSortedTopRows = 40000`

The setting above sets a threshold for sorted rows.

To circumvent the limitation, use subqueries:

To prevent the problem outlined above you can leverage the use of subqueries which make better use of temporary storage associated with this kind of quest. An example would take the form:

SELECT ?p ?s 
WHERE 
  {
    {
      SELECT DISTINCT ?p ?s 
      FROM <http://dbpedia.org> 
      WHERE   
        { 
          ?s ?p <http://dbpedia.org/resource/Germany> 
        } ORDER BY ASC(?p) 
    }
  } 
OFFSET 50000 
LIMIT 1000

Loading

cd virtuosodb/bin
rlwrap ./isql
SQL> ld_dir ('/tmp', 'ids.nt', 'http://jakub');
SQL> select * from load_list;
SQL> rdf_loader_run();

Optimizations

Use SSDs, otherwise it chokes on IO. Use multiple stripes (but have to know the size of the loaded db in advance). Use a lot of RAM.

Get results in a given format

TSV

curl -F "format=text/tab-separated-values" -F "[email protected]" http://localhost:8890/sparql > result.tsv

N-Triples

curl -F "format=text/plain" -F "[email protected]" http://localhost:8890/sparql > result.nt

or -F "format=text/ntriples"

Turtle

-F "format=text/n3"

Dump a graph

This doesn't work: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFDatasetDump cause:

*** Error 22023: [Virtuoso Driver][Virtuoso Server]SR601: Argument 1 of http_ttl_triple() should be an array of special format at line 1 of Top-Level: dump_one_graph ('http://example.org/news', '/example/dump/data_', 1000000000)

Fixing env by replacing it with vector (0, 0, 0) removes the problem. Also http_ttl_triple can be replaced by http_nt_triple to get N-Triples:

CREATE PROCEDURE dump_one_graph 
  ( IN  srcgraph           VARCHAR  , 
    IN  out_file           VARCHAR  , 
    IN  file_length_limit  INTEGER  := 1000000000
  )
  {
    DECLARE  file_name     VARCHAR;
    DECLARE  env, 
             ses           ANY;
    DECLARE  ses_len, 
             max_ses_len, 
             file_len, 
             file_idx      INTEGER;
    SET ISOLATION = 'uncommitted';
    max_ses_len  := 10000000;
    file_len     := 0;
    file_idx     := 1;
    file_name    := sprintf ('%s%06d.ttl', out_file, file_idx);
    string_to_file ( file_name || '.graph', 
                     srcgraph, 
                     -2
                   );
    string_to_file ( file_name, 
                     sprintf ( '# Dump of graph <%s>, as of %s\n@base <> .\n', 
                               srcgraph, 
                               CAST (NOW() AS VARCHAR)
                             ), 
                     -2
                   );
    env := vector (0, 0, 0);
    ses := string_output ();
    FOR (SELECT * FROM ( SPARQL DEFINE input:storage "" 
                         SELECT ?s ?p ?o { GRAPH `iri(?:srcgraph)` { ?s ?p ?o } } 
                       ) AS sub OPTION (LOOP)) DO
      {
        http_nt_triple (env, "s", "p", "o", ses);
        ses_len := length (ses);
        IF (ses_len > max_ses_len)
          {
            file_len := file_len + ses_len;
            IF (file_len > file_length_limit)
              {
                http (' .\n', ses);
                string_to_file (file_name, ses, -1);
		gz_compress_file (file_name, file_name||'.gz');
		file_delete (file_name);
                file_len := 0;
                file_idx := file_idx + 1;
                file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
                string_to_file ( file_name, 
                                 sprintf ( '# Dump of graph <%s>, as of %s (part %d)\n@base <> .\n', 
                                           srcgraph, 
                                           CAST (NOW() AS VARCHAR), 
                                           file_idx), 
                                 -2
                               );
                 env := VECTOR (dict_new (16000), 0, '', '', '', 0, 0, 0, 0, 0);
              }
            ELSE
              string_to_file (file_name, ses, -1);
            ses := string_output ();
          }
      }
    IF (LENGTH (ses))
      {
        http (' .\n', ses);
        string_to_file (file_name, ses, -1);
	gz_compress_file (file_name, file_name||'.gz');
	file_delete (file_name);
      }
  }

Loading DbPedia / Freebase

https://joernhees.de/blog/2014/11/10/setting-up-a-local-dbpedia-2014-mirror-with-virtuoso-7-1-0/

jkot/Virtuoso.md

Virtuoso tips and tricks

Virtuoso settings for retrieving large result sets

Loading

Optimizations

Get results in a given format

TSV

N-Triples

Turtle

Dump a graph

Loading DbPedia / Freebase

jindrichmynarz commented Feb 25, 2016

Uh oh!

jindrichmynarz commented Feb 25, 2016

Uh oh!

jindrichmynarz commented Feb 25, 2016

Uh oh!