Skip to content

Instantly share code, notes, and snippets.

@pashu123
Created April 10, 2025 16:48
Show Gist options
  • Save pashu123/a64d80075a722c6e4900a3984040ccff to your computer and use it in GitHub Desktop.
Save pashu123/a64d80075a722c6e4900a3984040ccff to your computer and use it in GitHub Desktop.
Running with warp reduction
---------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------------
BM__4_14336_4096/process_time/real_time 0.510 ms 0.489 ms 1394 items_per_second=1.96019k/s
BM__4_14336_4096/process_time/real_time 0.512 ms 0.518 ms 1394 items_per_second=1.95449k/s
BM__4_14336_4096/process_time/real_time 0.512 ms 0.519 ms 1394 items_per_second=1.9534k/s
BM__4_14336_4096/process_time/real_time 0.513 ms 0.521 ms 1394 items_per_second=1.95046k/s
BM__4_14336_4096/process_time/real_time 0.514 ms 0.521 ms 1394 items_per_second=1.94656k/s
BM__4_14336_4096/process_time/real_time_mean 0.512 ms 0.513 ms 5 items_per_second=1.95302k/s
BM__4_14336_4096/process_time/real_time_median 0.512 ms 0.519 ms 5 items_per_second=1.9534k/s
BM__4_14336_4096/process_time/real_time_stddev 0.001 ms 0.014 ms 5 items_per_second=5.04898/s
BM__4_14336_4096/process_time/real_time_cv 0.26 % 2.69 % 5 items_per_second=0.26%
Running with vector distribution
---------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------------
BM__4_14336_4096/process_time/real_time 0.285 ms 0.292 ms 2468 items_per_second=3.50526k/s
BM__4_14336_4096/process_time/real_time 0.285 ms 0.293 ms 2468 items_per_second=3.50538k/s
BM__4_14336_4096/process_time/real_time 0.287 ms 0.295 ms 2468 items_per_second=3.48339k/s
BM__4_14336_4096/process_time/real_time 0.288 ms 0.295 ms 2468 items_per_second=3.47494k/s
BM__4_14336_4096/process_time/real_time 0.285 ms 0.292 ms 2468 items_per_second=3.51345k/s
BM__4_14336_4096/process_time/real_time_mean 0.286 ms 0.293 ms 5 items_per_second=3.49649k/s
BM__4_14336_4096/process_time/real_time_median 0.285 ms 0.293 ms 5 items_per_second=3.50526k/s
BM__4_14336_4096/process_time/real_time_stddev 0.001 ms 0.002 ms 5 items_per_second=16.4286/s
BM__4_14336_4096/process_time/real_time_cv 0.47 % 0.58 % 5 items_per_second=0.47%
#########################################################################
Running with warp reduction
---------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------------
BM__4_4096_14336/process_time/real_time 0.511 ms 0.506 ms 1386 items_per_second=1.95632k/s
BM__4_4096_14336/process_time/real_time 0.512 ms 0.519 ms 1386 items_per_second=1.95271k/s
BM__4_4096_14336/process_time/real_time 0.512 ms 0.519 ms 1386 items_per_second=1.95226k/s
BM__4_4096_14336/process_time/real_time 0.513 ms 0.520 ms 1386 items_per_second=1.94802k/s
BM__4_4096_14336/process_time/real_time 0.510 ms 0.516 ms 1386 items_per_second=1.96144k/s
BM__4_4096_14336/process_time/real_time_mean 0.512 ms 0.516 ms 5 items_per_second=1.95415k/s
BM__4_4096_14336/process_time/real_time_median 0.512 ms 0.519 ms 5 items_per_second=1.95271k/s
BM__4_4096_14336/process_time/real_time_stddev 0.001 ms 0.006 ms 5 items_per_second=5.02837/s
BM__4_4096_14336/process_time/real_time_cv 0.26 % 1.15 % 5 items_per_second=0.26%
Running with vector distribution
---------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
---------------------------------------------------------------------------------------------------------
BM__4_4096_14336/process_time/real_time 0.297 ms 0.304 ms 2378 items_per_second=3.36988k/s
BM__4_4096_14336/process_time/real_time 0.300 ms 0.308 ms 2378 items_per_second=3.33147k/s
BM__4_4096_14336/process_time/real_time 0.300 ms 0.308 ms 2378 items_per_second=3.33434k/s
BM__4_4096_14336/process_time/real_time 0.300 ms 0.308 ms 2378 items_per_second=3.3303k/s
BM__4_4096_14336/process_time/real_time 0.299 ms 0.307 ms 2378 items_per_second=3.34504k/s
BM__4_4096_14336/process_time/real_time_mean 0.299 ms 0.307 ms 5 items_per_second=3.34221k/s
BM__4_4096_14336/process_time/real_time_median 0.300 ms 0.308 ms 5 items_per_second=3.33434k/s
BM__4_4096_14336/process_time/real_time_stddev 0.001 ms 0.002 ms 5 items_per_second=16.5279/s
BM__4_4096_14336/process_time/real_time_cv 0.49 % 0.63 % 5 items_per_second=0.49%
#########################################################################
Running with warp reduction
----------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------------------------
BM__4_128256_4096/process_time/real_time 4.25 ms 4.22 ms 165 items_per_second=235.524/s
BM__4_128256_4096/process_time/real_time 4.25 ms 4.23 ms 165 items_per_second=235.052/s
BM__4_128256_4096/process_time/real_time 4.25 ms 4.22 ms 165 items_per_second=235.474/s
BM__4_128256_4096/process_time/real_time 4.25 ms 4.23 ms 165 items_per_second=235.437/s
BM__4_128256_4096/process_time/real_time 4.25 ms 4.22 ms 165 items_per_second=235.489/s
BM__4_128256_4096/process_time/real_time_mean 4.25 ms 4.23 ms 5 items_per_second=235.395/s
BM__4_128256_4096/process_time/real_time_median 4.25 ms 4.22 ms 5 items_per_second=235.474/s
BM__4_128256_4096/process_time/real_time_stddev 0.004 ms 0.006 ms 5 items_per_second=0.194664/s
BM__4_128256_4096/process_time/real_time_cv 0.08 % 0.13 % 5 items_per_second=0.08%
Running with vector distribution
----------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
----------------------------------------------------------------------------------------------------------
BM__4_128256_4096/process_time/real_time 2.24 ms 2.22 ms 313 items_per_second=446.493/s
BM__4_128256_4096/process_time/real_time 2.24 ms 2.21 ms 313 items_per_second=447.348/s
BM__4_128256_4096/process_time/real_time 2.23 ms 2.21 ms 313 items_per_second=447.707/s
BM__4_128256_4096/process_time/real_time 2.23 ms 2.21 ms 313 items_per_second=447.819/s
BM__4_128256_4096/process_time/real_time 2.23 ms 2.21 ms 313 items_per_second=448.742/s
BM__4_128256_4096/process_time/real_time_mean 2.23 ms 2.21 ms 5 items_per_second=447.622/s
BM__4_128256_4096/process_time/real_time_median 2.23 ms 2.21 ms 5 items_per_second=447.707/s
BM__4_128256_4096/process_time/real_time_stddev 0.004 ms 0.003 ms 5 items_per_second=0.813816/s
BM__4_128256_4096/process_time/real_time_cv 0.18 % 0.13 % 5 items_per_second=0.18%
#########################################################################
Running with warp reduction
--------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
BM__4_4096_4096/process_time/real_time 0.164 ms 0.173 ms 4297 items_per_second=6.11157k/s
BM__4_4096_4096/process_time/real_time 0.164 ms 0.173 ms 4297 items_per_second=6.11039k/s
BM__4_4096_4096/process_time/real_time 0.163 ms 0.173 ms 4297 items_per_second=6.12874k/s
BM__4_4096_4096/process_time/real_time 0.164 ms 0.174 ms 4297 items_per_second=6.1099k/s
BM__4_4096_4096/process_time/real_time 0.164 ms 0.174 ms 4297 items_per_second=6.10506k/s
BM__4_4096_4096/process_time/real_time_mean 0.164 ms 0.173 ms 5 items_per_second=6.11313k/s
BM__4_4096_4096/process_time/real_time_median 0.164 ms 0.173 ms 5 items_per_second=6.11039k/s
BM__4_4096_4096/process_time/real_time_stddev 0.000 ms 0.000 ms 5 items_per_second=9.07144/s
BM__4_4096_4096/process_time/real_time_cv 0.15 % 0.22 % 5 items_per_second=0.15%
Running with vector distribution
--------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
BM__4_4096_4096/process_time/real_time 0.099 ms 0.105 ms 8217 items_per_second=10.129k/s
BM__4_4096_4096/process_time/real_time 0.099 ms 0.110 ms 8217 items_per_second=10.0527k/s
BM__4_4096_4096/process_time/real_time 0.099 ms 0.110 ms 8217 items_per_second=10.0725k/s
BM__4_4096_4096/process_time/real_time 0.100 ms 0.110 ms 8217 items_per_second=9.99812k/s
BM__4_4096_4096/process_time/real_time 0.100 ms 0.110 ms 8217 items_per_second=10.0484k/s
BM__4_4096_4096/process_time/real_time_mean 0.099 ms 0.109 ms 5 items_per_second=10.0601k/s
BM__4_4096_4096/process_time/real_time_median 0.099 ms 0.110 ms 5 items_per_second=10.0527k/s
BM__4_4096_4096/process_time/real_time_stddev 0.000 ms 0.002 ms 5 items_per_second=47.2627/s
BM__4_4096_4096/process_time/real_time_cv 0.47 % 1.94 % 5 items_per_second=0.47%
#########################################################################
Running with warp reduction
--------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
BM__4_1024_4096/process_time/real_time 0.079 ms 0.089 ms 10905 items_per_second=12.7153k/s
BM__4_1024_4096/process_time/real_time 0.081 ms 0.092 ms 10905 items_per_second=12.3528k/s
BM__4_1024_4096/process_time/real_time 0.081 ms 0.093 ms 10905 items_per_second=12.2919k/s
BM__4_1024_4096/process_time/real_time 0.082 ms 0.093 ms 10905 items_per_second=12.2529k/s
BM__4_1024_4096/process_time/real_time 0.081 ms 0.093 ms 10905 items_per_second=12.2976k/s
BM__4_1024_4096/process_time/real_time_mean 0.081 ms 0.092 ms 5 items_per_second=12.3821k/s
BM__4_1024_4096/process_time/real_time_median 0.081 ms 0.093 ms 5 items_per_second=12.2976k/s
BM__4_1024_4096/process_time/real_time_stddev 0.001 ms 0.002 ms 5 items_per_second=189.634/s
BM__4_1024_4096/process_time/real_time_cv 1.50 % 1.83 % 5 items_per_second=1.53%
Running with vector distribution
--------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
--------------------------------------------------------------------------------------------------------
BM__4_1024_4096/process_time/real_time 0.059 ms 0.069 ms 13775 items_per_second=16.966k/s
BM__4_1024_4096/process_time/real_time 0.059 ms 0.070 ms 13775 items_per_second=16.8248k/s
BM__4_1024_4096/process_time/real_time 0.062 ms 0.073 ms 13775 items_per_second=16.2226k/s
BM__4_1024_4096/process_time/real_time 0.066 ms 0.077 ms 13775 items_per_second=15.2294k/s
BM__4_1024_4096/process_time/real_time 0.065 ms 0.076 ms 13775 items_per_second=15.411k/s
BM__4_1024_4096/process_time/real_time_mean 0.062 ms 0.073 ms 5 items_per_second=16.1308k/s
BM__4_1024_4096/process_time/real_time_median 0.062 ms 0.073 ms 5 items_per_second=16.2226k/s
BM__4_1024_4096/process_time/real_time_stddev 0.003 ms 0.004 ms 5 items_per_second=793.437/s
BM__4_1024_4096/process_time/real_time_cv 4.95 % 5.33 % 5 items_per_second=4.92%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment