Skip to content

Instantly share code, notes, and snippets.

@netj
Last active January 7, 2016 00:46
Show Gist options
  • Save netj/c6f15bb78ff3a52057cb to your computer and use it in GitHub Desktop.
Save netj/c6f15bb78ff3a52057cb to your computer and use it in GitHub Desktop.
Some overhead numbers on compressing factor graph binaries

Here are some preliminary tests to see if compressing the grounded factor graph binaries make sense or not.

  • Tested 1/6/2016 on raiders6 with 112 threads on an ext4fs backed by a SSD RAID.
  • Uses a 2MB factors binary from spouse example, and repeats it 1000 times to create a ~2GB data.
  • Confirmed reading/writing without compression puts no additional overhead as those IOs have to happen even with compression.

pbzip2 option seems promising: it gives 60-70% read performance with 10x less space and in turn IO. It cuts the output throughput down to 15% (33MB/s), but that may be less problematic if the grounding dump is the bottleneck.

  • The numbers here for compression may not extrapolate well to multiple grounding/dump processes that has to unload data from database, do format_converter, then doing pbzip2/lbzip2/pigz/gzip/bzip2 at the end, although they all seem IO bound.
  • However, the decompression side will probably extrapolate better as the sampler (single process) simply loads everything into memory.

No compression

Just to see the raw read/write performance is not a bottleneck.

Writing

$ yes factors.bin | head -1000 | xargs cat | pv -cN out | dd of=factors1000x.bin                                                                              
      out: 1.88GiB 0:00:09 [ 208MiB/s] [                      <=>                                                                                          ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 9.23773 s, 219 MB/s

Reading

$ cat factors1000x.bin | pv -cN out | dd of=/dev/null
      out: 1.88GiB 0:00:02 [ 657MiB/s] [      <=>                                                                                                          ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 2.93028 s, 689 MB/s

Compression

pbzip2

$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | pbzip2 | pv -c -N out | dd of=factors.bin.bz2
       in: 1.88GiB 0:00:04 [ 396MiB/s] [           <=>                                                                                                     ]
      out:  160MiB 0:00:05 [31.4MiB/s] [             <=>                                                                                                   ]
327983+450 records in
328210+1 records out
168043549 bytes (168 MB) copied, 5.11465 s, 32.9 MB/s

lbzip2

$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | lbzip2 | pv -c -N out | dd of=factors.bin.bz2-lbzip2
       in: 1.88GiB 0:00:05 [ 335MiB/s] [             <=>                                                                                                   ]
      out:  159MiB 0:00:05 [26.8MiB/s] [             <=>                                                                                                   ]
327441+434 records in
327665+1 records out
167764497 bytes (168 MB) copied, 5.96772 s, 28.1 MB/s

bzip2

$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | bzip2 | pv -c -N out | dd of=factors.bin.bz2
       in: 1.88GiB 0:02:09 [14.9MiB/s] [                                                                   <=>                                             ]
      out:  162MiB 0:02:09 [1.26MiB/s] [                                                                   <=>                                             ]
332436+1 records in
332436+1 records out
170207238 bytes (170 MB) copied, 129.156 s, 1.3 MB/s

pigz

$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | pigz | pv -c -N out | dd of=factors.bin.gz
       in: 1.88GiB 0:00:02 [ 885MiB/s] [      <=>                                                                                                          ]
      out:  186MiB 0:00:02 [84.8MiB/s] [      <=>                                                                                                          ]
377139+10625 records in
382715+1 records out
195950103 bytes (196 MB) copied, 2.20403 s, 88.9 MB/s

$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | pigz | pv -c -N out | dd of=factors.bin.gz
       in: 1.88GiB 0:00:02 [ 854MiB/s] [      <=>                                                                                                          ]
      out:  186MiB 0:00:02 [82.1MiB/s] [      <=>                                                                                                          ]
377254+10462 records in
382715+1 records out
195950103 bytes (196 MB) copied, 2.24638 s, 87.2 MB/s

gzip

$ yes factors.bin | head -1000 | xargs cat | pv -c -N in | gzip | pv -c -N out | dd of=factors.bin.gz2                                                      
       in: 1.88GiB 0:00:36 [52.5MiB/s] [                                                                                  <=>                              ]
      out:  187MiB 0:00:36 [5.11MiB/s] [                                                                                  <=>                              ]
383666+1 records in
383666+1 records out
196437406 bytes (196 MB) copied, 36.6749 s, 5.4 MB/s

Decompression

pbzip2 -d

$ cat factors.bin.bz2 | pv -cN in | pbzip2 -d | pv -cN out | dd of=/dev/null
       in:  160MiB 0:00:03 [  43MiB/s] [        <=>                                                                                                        ]
      out: 1.88GiB 0:00:04 [ 455MiB/s] [           <=>                                                                                                     ]
3943509+581 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 4.22549 s, 478 MB/s

pbzip2 -d --stdout

$ pbzip2 --stdout -d factors.bin.bz2 | pv -cN out | dd of=/dev/null
      out: 1.88GiB 0:00:04 [ 401MiB/s] [           <=>                                                                                                     ]
3943801+199 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 4.79894 s, 421 MB/s

lbzip2 -d

$ cat factors.bin.bz2 | pv -cN in | lbzip2 -d | pv -cN out | dd of=/dev/null
       in:  160MiB 0:00:01 [ 121MiB/s] [    <=>                                                                                                            ]
      out: 1.88GiB 0:00:05 [ 384MiB/s] [           <=>                                                                                                     ]
3943632+409 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 5.006 s, 403 MB/s

lbzip2 -d --stdout

$ lbzip2 --stdout -d factors.bin.bz2 | pv -cN out | dd of=/dev/null                                                                                         
      out: 1.88GiB 0:00:04 [ 411MiB/s] [           <=>                                                                                                     ]
3943236+971 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 4.67484 s, 432 MB/s

bzip2 -d

$ cat factors.bin.bz2 | pv -cN in | bzip2 -d | pv -cN out | dd of=/dev/null
      out: 1.88GiB 0:00:42 [45.6MiB/s] [                                                                                                <=>                ]
       in:  160MiB 0:00:42 [ 3.8MiB/s] [                                                                                                <=>                ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 42.2235 s, 47.8 MB/s

pigz -d

$ cat factors.bin.gz | pv -cN in | pigz -d | pv -cN out | dd of=/dev/null
       in:  186MiB 0:00:10 [17.4MiB/s] [                        <=>                                                                                        ]
      out: 1.88GiB 0:00:10 [ 178MiB/s] [                        <=>                                                                                        ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 10.7662 s, 188 MB/s

pigz -d --stdout

$ pigz --stdout -d factors.bin.gz | pv -cN out | dd of=/dev/null
      out: 1.88GiB 0:00:11 [ 172MiB/s] [                          <=>                                                                                      ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 11.1827 s, 181 MB/s

gzip -d

$ cat factors.bin.gz | pv -cN in | gzip -d | pv -cN out | dd of=/dev/null
       in:  186MiB 0:00:10 [  18MiB/s] [                        <=>                                                                                        ]
      out: 1.88GiB 0:00:10 [ 185MiB/s] [                        <=>                                                                                        ]
3943925+1 records in
3943925+1 records out
2019290000 bytes (2.0 GB) copied, 10.3734 s, 195 MB/s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment