Created
January 27, 2014 22:11
-
-
Save cgravier/8658389 to your computer and use it in GitHub Desktop.
Generates BSBM (http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/) dataset of 100k, 200k, 500k, 1M, 5M, 10M, 25M, 50M triples in n-triples format.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
datasetssize=( 256 527 1369 2808 14212 28453 71431 143700 288114 ) | |
for dim in "${datasetssize[@]}" | |
do | |
echo "Generating dataset for $dim products..." | |
java -cp .:lib/bsbm.jar:lib/jdom.jar:lib/log4j-1.2.12.jar:lib/ssj.jar -Xmx256M benchmark.generator.Generator -pc $dim -s nt -fn datasettmp | |
NB=`more datasettmp.nt | wc -l` | |
mv datasettmp.nt dataset_$NB.nt | |
echo "done." | |
done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For the settings in this gist, I lazily rename generated files using :
mv dataset_99914.nt dataset_100k.nt
mv dataset_200007.nt dataset_200k.nt
mv dataset_500037.nt dataset_500k.nt
mv dataset_1000000.nt dataset_1M.nt
mv dataset_5000000.nt dataset_5M.nt
mv dataset_10000159.nt dataset_10M.nt
mv dataset_25000172.nt dataset_25M.nt
mv dataset_50000144.nt dataset_50M.nt
mv dataset_99999805.nt .ntdataset_100M.nt