Here are some preliminary tests to see if compressing the grounded factor graph binaries make sense or not.
- Tested 1/6/2016 on raiders6 with 112 threads on an ext4fs backed by a SSD RAID.
- Uses a 2MB factors binary from spouse example, and repeats it 1000 times to create a ~2GB data.
- Confirmed reading/writing without compression puts no additional overhead as those IOs have to happen even with compression.
pbzip2 option seems promising: it gives 60-70% read performance with 10x less space and in turn IO. It cuts the output throughput down to 15% (33MB/s), but that may be less problematic if the grounding dump is the bottleneck.
- The numbers here for compression may not extrapolate well to multiple grounding/dump processes that has to unload data from database, do format_converter, then doing pbzip2/lbzip2/pigz/gzip/bzip2 at the end, although they all seem IO bound.
- However, the decompression side will probably extrapolate better as the sampler (single process) simply loads everything into memory.
Just to see the raw read/write performance is not a bottleneck.
$ yes factors.bin | head -1000 | xargs cat | pv -cN out | dd of=factors1000x.bin
out: 1.88GiB 0:00:09 [ 208MiB/s] [ <=> ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 9.23773 s, 219 MB/s
$ cat factors1000x.bin | pv -cN out | dd of=/dev/null
out: 1.88GiB 0:00:02 [ 657MiB/s] [ <=> ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 2.93028 s, 689 MB/s
$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | pbzip2 | pv -c -N out | dd of=factors.bin.bz2
in: 1.88GiB 0:00:04 [ 396MiB/s] [ <=> ]
out: 160MiB 0:00:05 [31.4MiB/s] [ <=> ]
327983+450 records in
328210+1 records out
168043549 bytes (168 MB) copied, 5.11465 s, 32.9 MB/s
$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | lbzip2 | pv -c -N out | dd of=factors.bin.bz2-lbzip2
in: 1.88GiB 0:00:05 [ 335MiB/s] [ <=> ]
out: 159MiB 0:00:05 [26.8MiB/s] [ <=> ]
327441+434 records in
327665+1 records out
167764497 bytes (168 MB) copied, 5.96772 s, 28.1 MB/s
$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | bzip2 | pv -c -N out | dd of=factors.bin.bz2
in: 1.88GiB 0:02:09 [14.9MiB/s] [ <=> ]
out: 162MiB 0:02:09 [1.26MiB/s] [ <=> ]
332436+1 records in
332436+1 records out
170207238 bytes (170 MB) copied, 129.156 s, 1.3 MB/s
$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | pigz | pv -c -N out | dd of=factors.bin.gz
in: 1.88GiB 0:00:02 [ 885MiB/s] [ <=> ]
out: 186MiB 0:00:02 [84.8MiB/s] [ <=> ]
377139+10625 records in
382715+1 records out
195950103 bytes (196 MB) copied, 2.20403 s, 88.9 MB/s
$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | pigz | pv -c -N out | dd of=factors.bin.gz
in: 1.88GiB 0:00:02 [ 854MiB/s] [ <=> ]
out: 186MiB 0:00:02 [82.1MiB/s] [ <=> ]
377254+10462 records in
382715+1 records out
195950103 bytes (196 MB) copied, 2.24638 s, 87.2 MB/s
$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | gzip | pv -c -N out | dd of=factors.bin.gz2
in: 1.88GiB 0:00:36 [52.5MiB/s] [ <=> ]
out: 187MiB 0:00:36 [5.11MiB/s] [ <=> ]
383666+1 records in
383666+1 records out
196437406 bytes (196 MB) copied, 36.6749 s, 5.4 MB/s
$ cat factors.bin.bz2 | pv -cN in | pbzip2 -d | pv -cN out | dd of=/dev/null
in: 160MiB 0:00:03 [ 43MiB/s] [ <=> ]
out: 1.88GiB 0:00:04 [ 455MiB/s] [ <=> ]
3943509+581 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 4.22549 s, 478 MB/s
$ pbzip2 --stdout -d factors.bin.bz2 | pv -cN out | dd of=/dev/null
out: 1.88GiB 0:00:04 [ 401MiB/s] [ <=> ]
3943801+199 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 4.79894 s, 421 MB/s
$ cat factors.bin.bz2 | pv -cN in | lbzip2 -d | pv -cN out | dd of=/dev/null
in: 160MiB 0:00:01 [ 121MiB/s] [ <=> ]
out: 1.88GiB 0:00:05 [ 384MiB/s] [ <=> ]
3943632+409 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 5.006 s, 403 MB/s
$ lbzip2 --stdout -d factors.bin.bz2 | pv -cN out | dd of=/dev/null
out: 1.88GiB 0:00:04 [ 411MiB/s] [ <=> ]
3943236+971 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 4.67484 s, 432 MB/s
$ cat factors.bin.bz2 | pv -cN in | bzip2 -d | pv -cN out | dd of=/dev/null
out: 1.88GiB 0:00:42 [45.6MiB/s] [ <=> ]
in: 160MiB 0:00:42 [ 3.8MiB/s] [ <=> ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 42.2235 s, 47.8 MB/s
$ cat factors.bin.gz | pv -cN in | pigz -d | pv -cN out | dd of=/dev/null
in: 186MiB 0:00:10 [17.4MiB/s] [ <=> ]
out: 1.88GiB 0:00:10 [ 178MiB/s] [ <=> ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 10.7662 s, 188 MB/s
$ pigz --stdout -d factors.bin.gz | pv -cN out | dd of=/dev/null
out: 1.88GiB 0:00:11 [ 172MiB/s] [ <=> ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 11.1827 s, 181 MB/s
$ cat factors.bin.gz | pv -cN in | gzip -d | pv -cN out | dd of=/dev/null
in: 186MiB 0:00:10 [ 18MiB/s] [ <=> ]
out: 1.88GiB 0:00:10 [ 185MiB/s] [ <=> ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 10.3734 s, 195 MB/s