[Parameters]
MaxSortedTopRows = 250000
[SPARQL]
ResultSetMaxRows = 10000000
MaxQueryCostEstimationTime = 36000 ; in seconds
MaxQueryExecutionTime = 36000 ; in seconds
Working with constraints DBpedia's SPARQL endpoint MaxSortedTopRows Limits via LIMIT & OFFSET
The DBpedia SPARQL endpoint is configured with the following INI setting:
`MaxSortedTopRows = 40000`
The setting above sets a threshold for sorted rows.
To circumvent the limitation, use subqueries:
To prevent the problem outlined above you can leverage the use of subqueries which make better use of temporary storage associated with this kind of quest. An example would take the form:
SELECT ?p ?s
WHERE
{
{
SELECT DISTINCT ?p ?s
FROM <http://dbpedia.org>
WHERE
{
?s ?p <http://dbpedia.org/resource/Germany>
} ORDER BY ASC(?p)
}
}
OFFSET 50000
LIMIT 1000
cd virtuosodb/bin
rlwrap ./isql
SQL> ld_dir ('/tmp', 'ids.nt', 'http://jakub');
SQL> select * from load_list;
SQL> rdf_loader_run();
Use SSDs, otherwise it chokes on IO. Use multiple stripes (but have to know the size of the loaded db in advance). Use a lot of RAM.
curl -F "format=text/tab-separated-values" -F "[email protected]" http://localhost:8890/sparql > result.tsv
curl -F "format=text/plain" -F "[email protected]" http://localhost:8890/sparql > result.nt
or -F "format=text/ntriples"
-F "format=text/n3"
This doesn't work: http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFDatasetDump cause:
*** Error 22023: [Virtuoso Driver][Virtuoso Server]SR601: Argument 1 of http_ttl_triple() should be an array of special format at line 1 of Top-Level: dump_one_graph ('http://example.org/news', '/example/dump/data_', 1000000000)
Fixing env
by replacing it with vector (0, 0, 0)
removes the problem. Also http_ttl_triple
can be replaced by http_nt_triple
to get N-Triples:
CREATE PROCEDURE dump_one_graph
( IN srcgraph VARCHAR ,
IN out_file VARCHAR ,
IN file_length_limit INTEGER := 1000000000
)
{
DECLARE file_name VARCHAR;
DECLARE env,
ses ANY;
DECLARE ses_len,
max_ses_len,
file_len,
file_idx INTEGER;
SET ISOLATION = 'uncommitted';
max_ses_len := 10000000;
file_len := 0;
file_idx := 1;
file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
string_to_file ( file_name || '.graph',
srcgraph,
-2
);
string_to_file ( file_name,
sprintf ( '# Dump of graph <%s>, as of %s\n@base <> .\n',
srcgraph,
CAST (NOW() AS VARCHAR)
),
-2
);
env := vector (0, 0, 0);
ses := string_output ();
FOR (SELECT * FROM ( SPARQL DEFINE input:storage ""
SELECT ?s ?p ?o { GRAPH `iri(?:srcgraph)` { ?s ?p ?o } }
) AS sub OPTION (LOOP)) DO
{
http_nt_triple (env, "s", "p", "o", ses);
ses_len := length (ses);
IF (ses_len > max_ses_len)
{
file_len := file_len + ses_len;
IF (file_len > file_length_limit)
{
http (' .\n', ses);
string_to_file (file_name, ses, -1);
gz_compress_file (file_name, file_name||'.gz');
file_delete (file_name);
file_len := 0;
file_idx := file_idx + 1;
file_name := sprintf ('%s%06d.ttl', out_file, file_idx);
string_to_file ( file_name,
sprintf ( '# Dump of graph <%s>, as of %s (part %d)\n@base <> .\n',
srcgraph,
CAST (NOW() AS VARCHAR),
file_idx),
-2
);
env := VECTOR (dict_new (16000), 0, '', '', '', 0, 0, 0, 0, 0);
}
ELSE
string_to_file (file_name, ses, -1);
ses := string_output ();
}
}
IF (LENGTH (ses))
{
http (' .\n', ses);
string_to_file (file_name, ses, -1);
gz_compress_file (file_name, file_name||'.gz');
file_delete (file_name);
}
}
https://joernhees.de/blog/2014/11/10/setting-up-a-local-dbpedia-2014-mirror-with-virtuoso-7-1-0/
The amended
dump_one_graph
procedure for producing N-Triples must usevector (0, 0, 0)
also instead of the secondVECTOR (dict_new (16000), 0, '', '', '', 0, 0, 0, 0, 0)
.