4 core machine with svof loaded:
| relative | ns/op | op/s | err% | total | Implementations
|---------:|--------------------:|--------------------:|--------:|----------:|:----------------
| 100.0% | 180,631.04 | 5,536.15 | 2.2% | 4.58 | `plain for loop`
| 96.9% | 186,443.52 | 5,363.55 | 3.5% | 4.55 | `std::copy_if`
| 110.4% | 163,570.31 | 6,113.58 | 1.4% | 4.00 | `tbb::parallel_for_each, no acc.`
| 108.3% | 166,760.68 | 5,996.62 | 2.6% | 4.13 | `tbb::parallel_for, automatic grain`
| 107.8% | 167,554.00 | 5,968.22 | 2.2% | 4.08 | `tbb::parallel_for, 10 grainsize`