Skip to content

Instantly share code, notes, and snippets.

@geohot
Created September 19, 2024 01:38
Show Gist options
  • Save geohot/44b7a2af494bdaa213b0715b1799dd00 to your computer and use it in GitHub Desktop.
Save geohot/44b7a2af494bdaa213b0715b1799dd00 to your computer and use it in GitHub Desktop.
tinygrad 0.7 openpilot 0.9.7 run
comma@tiny24:/data/openpilot/tinygrad_repo$ python3 openpilot/compile2.py https://github.com/commaai/openpilot/raw/v0.9.7/selfdrive/modeld/models/supercombo.onnx
https://github.com/commaai/openpilot/raw/v0.9.7/selfdrive/modeld/models/supercombo.onnx: 100%|███████████████████████████████████████████| 51.5M/51.5M [00:00<00:00, 88.2MB/s]
cache is out of date, clearing it
/usr/local/pyenv/versions/3.11.4/lib/python3.11/site-packages/pyopencl/__init__.py:528: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more.
lambda: self._prg.build(options_bytes, devices),
190 schedule items depend on the input, 462 don't
7 inputs
13: rewrite input, image dtype dtypes.imageh((16, 2048, 4)), (View(shape=(1, 16, 32, 64, 2), strides=(0, 8192, 256, 4, 1), offset=0, mask=None, contiguous=False), View(shape=(1, 16, 32, 128), strides=(0, 4096, 128, 1), offset=0, mask=None, contiguous=True))
24: rewrite input, image dtype dtypes.imageh((8, 2048, 4)), (View(shape=(1, 8, 16, 128, 2), strides=(0, 8192, 512, 4, 1), offset=0, mask=None, contiguous=False), View(shape=(1, 8, 16, 256), strides=(0, 4096, 256, 1), offset=0, mask=None, contiguous=True))
51: rewrite input, image dtype dtypes.imageh((4, 2048, 4)), (View(shape=(1, 4, 8, 256, 2), strides=(0, 8192, 1024, 4, 1), offset=0, mask=None, contiguous=False), View(shape=(1, 4, 8, 512), strides=(0, 4096, 512, 1), offset=0, mask=None, contiguous=True))
62: rewrite input, image dtype dtypes.imageh((4, 4096, 4)), (View(shape=(1, 4, 8, 512, 2), strides=(0, 16384, 2048, 4, 1), offset=0, mask=None, contiguous=False), View(shape=(1, 4, 8, 1024), strides=(0, 8192, 1024, 1), offset=0, mask=None, contiguous=True))
73: rewrite output, output shape 1, image dtype dtypes.imageh((1, 128, 4)) prod 512
79: rewrite output, output shape 10, image dtype dtypes.imageh((10, 128, 4)) prod 5120
80: rewrite output, output shape 10, image dtype dtypes.imageh((10, 128, 4)) prod 5120
86: rewrite output, output shape 800, image dtype dtypes.imageh((10, 24, 4)) prod 960
86: rewrite input, image dtype dtypes.imageh((10, 24, 4)), (View(shape=(1, 8, 10, 10), strides=(0, 12, 96, 1), offset=0, mask=None, contiguous=False),)
87: rewrite output, output shape 80, image dtype dtypes.imageh((10, 24, 4)) prod 960
88: rewrite output, output shape 800, image dtype dtypes.imageh((10, 24, 4)) prod 960
89: rewrite output, output shape 80, image dtype dtypes.imageh((10, 24, 4)) prod 960
90: rewrite output, output shape 800, image dtype dtypes.imageh((10, 24, 4)) prod 960
95: rewrite output, output shape 10, image dtype dtypes.imageh((10, 128, 4)) prod 5120
96: rewrite output, output shape 10, image dtype dtypes.imageh((10, 128, 4)) prod 5120
100: rewrite output, output shape 512, image dtype dtypes.imageh((10, 128, 4)) prod 5120
169: rewrite output, output shape 512, image dtype dtypes.imageh((10, 128, 4)) prod 5120
182: rewrite input, image dtype dtypes.imageh((1, 1239, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=0, mask=((0, 1), (0, 4955)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 132, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-4955, mask=((0, 1), (4955, 5483)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 2, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5483, mask=((0, 1), (5483, 5491)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 66, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5491, mask=((0, 1), (5491, 5755)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 26, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5755, mask=((0, 1), (5755, 5857)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 1, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5857, mask=((0, 1), (5857, 5860)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 2, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5860, mask=((0, 1), (5860, 5868)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 12, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5868, mask=((0, 1), (5868, 5916)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 8, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5916, mask=((0, 1), (5916, 5948)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 3, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5948, mask=((0, 1), (5948, 5960)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 2, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5960, mask=((0, 1), (5960, 5966)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 3, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5966, mask=((0, 1), (5966, 5978)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 3, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5978, mask=((0, 1), (5978, 5990)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 1, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5990, mask=((0, 1), (5990, 5992)), contiguous=False),)
182: rewrite input, image dtype dtypes.imageh((1, 128, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5992, mask=((0, 1), (5992, 6504)), contiguous=False),)
**** running real kernels 150/183 images ****
*** 0 E_32768_4_6 arg 3 sz [512, 1, 1] [64, 1, 1] OPs 0M/ 0.00G mem 0.26 GB tm 272.90us/ 0.27ms ( 2.88 GFLOPS, 28.82 GB/s)
*** 1 r_64_32_16_6_3_3_4_4_4 arg 4 sz [2, 1, 64] [8, 32, 1] OPs 234M/ 0.00G mem 0.26 GB tm 1244.16us/ 1.52ms ( 188.79 GFLOPS, 3.71 GB/s)
*** 2 r_32_16_16_4_4_3_3 arg 4 sz [2, 1, 8] [8, 16, 4] OPs 4M/ 0.24G mem 0.26 GB tm 88.83us/ 1.61ms ( 50.17 GFLOPS, 29.53 GB/s)
*** 3 r_32_16_16_16_4_4_4 arg 4 sz [2, 2, 16] [8, 8, 2] OPs 18M/ 0.24G mem 0.26 GB tm 101.12us/ 1.71ms ( 186.65 GFLOPS, 5.27 GB/s)
*** 4 r_32_16_16_4_4_3_3n1 arg 4 sz [4, 4, 4] [4, 4, 8] OPs 2M/ 0.26G mem 0.26 GB tm 42.75us/ 1.75ms ( 58.25 GFLOPS, 61.35 GB/s)
*** 5 r_32_16_16_7_4_4_7 arg 4 sz [4, 4, 1] [4, 4, 32] OPs 12M/ 0.26G mem 0.26 GB tm 160.77us/ 1.91ms ( 80.71 GFLOPS, 81.57 GB/s)
*** 6 r_32_16_48_16_4_4_4 arg 4 sz [3, 4, 4] [16, 4, 8] OPs 56M/ 0.27G mem 0.26 GB tm 295.17us/ 2.21ms ( 191.83 GFLOPS, 3.64 GB/s)
*** 7 r_32_16_16_48_4_4_4 arg 6 sz [2, 1, 32] [8, 16, 1] OPs 50M/ 0.33G mem 0.26 GB tm 256.00us/ 2.46ms ( 198.14 GFLOPS, 5.22 GB/s)
*** 8 r_32_16_16_4_4_3_3n1 arg 4 sz [4, 4, 4] [4, 4, 8] OPs 2M/ 0.38G mem 0.26 GB tm 53.76us/ 2.52ms ( 46.32 GFLOPS, 48.79 GB/s)
*** 9 r_32_16_16_7_4_4_7 arg 4 sz [4, 4, 1] [4, 4, 32] OPs 12M/ 0.38G mem 0.26 GB tm 168.96us/ 2.68ms ( 76.80 GFLOPS, 77.61 GB/s)
*** 10 r_32_16_48_16_4_4_4 arg 4 sz [3, 4, 4] [16, 4, 8] OPs 56M/ 0.40G mem 0.27 GB tm 302.85us/ 2.99ms ( 186.97 GFLOPS, 3.55 GB/s)
*** 11 r_32_16_16_48_4_4_4 arg 6 sz [2, 1, 32] [8, 16, 1] OPs 50M/ 0.45G mem 0.27 GB tm 275.97us/ 3.26ms ( 183.81 GFLOPS, 4.84 GB/s)
*** 12 r_16_8_16_7_7_4_4_4 arg 3 sz [4, 1, 4] [4, 8, 4] OPs 12M/ 0.50G mem 0.27 GB tm 168.19us/ 3.43ms ( 76.37 GFLOPS, 22.36 GB/s)
*** 13 E_128_32_4_4n1 arg 3 sz [2, 128, 1] [16, 1, 1] OPs 0M/ 0.52G mem 0.27 GB tm 43.01us/ 3.47ms ( 1.52 GFLOPS, 9.15 GB/s)
*** 14 r_16_8_32_32_4_4_4 arg 4 sz [4, 1, 8] [8, 8, 2] OPs 17M/ 0.52G mem 0.27 GB tm 96.77us/ 3.57ms ( 184.21 GFLOPS, 3.05 GB/s)
*** 15 r_16_8_32_4_4_3_3 arg 4 sz [4, 2, 4] [8, 4, 4] OPs 1M/ 0.54G mem 0.27 GB tm 25.86us/ 3.60ms ( 48.16 GFLOPS, 50.80 GB/s)
*** 16 r_16_8_32_7_4_4_7 arg 4 sz [8, 1, 1] [4, 8, 16] OPs 6M/ 0.54G mem 0.27 GB tm 87.04us/ 3.68ms ( 74.54 GFLOPS, 75.44 GB/s)
*** 17 r_16_8_96_32_4_4_4 arg 4 sz [6, 2, 2] [16, 4, 8] OPs 53M/ 0.54G mem 0.27 GB tm 275.97us/ 3.96ms ( 193.78 GFLOPS, 2.26 GB/s)
*** 18 r_16_8_32_96_4_4_4 arg 6 sz [2, 1, 2] [16, 8, 8] OPs 50M/ 0.60G mem 0.27 GB tm 270.08us/ 4.23ms ( 187.09 GFLOPS, 2.79 GB/s)
*** 19 r_16_8_32_4_4_3_3 arg 4 sz [4, 2, 4] [8, 4, 4] OPs 1M/ 0.65G mem 0.27 GB tm 32.00us/ 4.26ms ( 38.91 GFLOPS, 41.05 GB/s)
*** 20 r_16_8_32_7_4_4_7 arg 4 sz [8, 1, 1] [4, 8, 16] OPs 6M/ 0.65G mem 0.27 GB tm 97.02us/ 4.36ms ( 66.87 GFLOPS, 67.68 GB/s)
*** 21 r_16_8_96_32_4_4_4 arg 4 sz [6, 2, 2] [16, 4, 8] OPs 53M/ 0.65G mem 0.27 GB tm 288.77us/ 4.65ms ( 185.19 GFLOPS, 2.16 GB/s)
*** 22 r_16_8_32_96_4_4_4 arg 6 sz [2, 1, 2] [16, 8, 8] OPs 50M/ 0.71G mem 0.27 GB tm 292.10us/ 4.94ms ( 172.99 GFLOPS, 2.58 GB/s)
*** 23 r_8_4_32_7_7_4_4_4 arg 3 sz [2, 4, 2] [16, 1, 4] OPs 6M/ 0.76G mem 0.27 GB tm 88.32us/ 5.03ms ( 72.72 GFLOPS, 21.72 GB/s)
*** 24 E_32_64_4_4 arg 3 sz [4, 32, 1] [16, 1, 1] OPs 0M/ 0.77G mem 0.27 GB tm 20.99us/ 5.05ms ( 1.56 GFLOPS, 9.41 GB/s)
*** 25 r_8_4_64_64_4_4_4 arg 4 sz [4, 4, 2] [16, 1, 4] OPs 17M/ 0.77G mem 0.27 GB tm 104.96us/ 5.15ms ( 164.84 GFLOPS, 2.51 GB/s)
*** 26 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 0.78G mem 0.27 GB tm 16.13us/ 5.17ms ( 38.60 GFLOPS, 40.98 GB/s)
*** 27 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 0.78G mem 0.27 GB tm 49.15us/ 5.22ms ( 66.00 GFLOPS, 67.20 GB/s)
*** 28 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 0.79G mem 0.27 GB tm 275.97us/ 5.50ms ( 188.08 GFLOPS, 2.39 GB/s)
*** 29 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 0.84G mem 0.27 GB tm 275.20us/ 5.77ms ( 183.25 GFLOPS, 2.63 GB/s)
*** 30 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 0.89G mem 0.27 GB tm 23.81us/ 5.79ms ( 26.15 GFLOPS, 27.76 GB/s)
*** 31 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 0.89G mem 0.27 GB tm 57.09us/ 5.85ms ( 56.83 GFLOPS, 57.86 GB/s)
*** 32 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 0.89G mem 0.27 GB tm 297.98us/ 6.15ms ( 174.19 GFLOPS, 2.21 GB/s)
*** 33 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 0.94G mem 0.27 GB tm 292.10us/ 6.44ms ( 172.65 GFLOPS, 2.48 GB/s)
*** 34 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 1.00G mem 0.27 GB tm 19.97us/ 6.46ms ( 31.18 GFLOPS, 33.10 GB/s)
*** 35 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 1.00G mem 0.27 GB tm 53.76us/ 6.52ms ( 60.34 GFLOPS, 61.44 GB/s)
*** 36 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 1.00G mem 0.27 GB tm 296.96us/ 6.81ms ( 174.79 GFLOPS, 2.22 GB/s)
*** 37 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 1.05G mem 0.27 GB tm 293.12us/ 7.11ms ( 172.05 GFLOPS, 2.47 GB/s)
*** 38 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 1.10G mem 0.27 GB tm 18.94us/ 7.12ms ( 32.86 GFLOPS, 34.89 GB/s)
*** 39 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 1.10G mem 0.27 GB tm 53.76us/ 7.18ms ( 60.34 GFLOPS, 61.44 GB/s)
*** 40 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 1.11G mem 0.27 GB tm 297.98us/ 7.48ms ( 174.19 GFLOPS, 2.21 GB/s)
*** 41 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 1.16G mem 0.27 GB tm 295.17us/ 7.77ms ( 170.85 GFLOPS, 2.45 GB/s)
*** 42 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 1.21G mem 0.27 GB tm 20.99us/ 7.79ms ( 29.66 GFLOPS, 31.49 GB/s)
*** 43 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 1.21G mem 0.27 GB tm 54.78us/ 7.85ms ( 59.21 GFLOPS, 60.29 GB/s)
*** 44 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 1.21G mem 0.27 GB tm 297.98us/ 8.15ms ( 174.19 GFLOPS, 2.21 GB/s)
*** 45 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 1.26G mem 0.27 GB tm 295.94us/ 8.44ms ( 170.41 GFLOPS, 2.44 GB/s)
*** 46 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 1.31G mem 0.27 GB tm 22.78us/ 8.46ms ( 27.33 GFLOPS, 29.01 GB/s)
*** 47 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 1.31G mem 0.27 GB tm 55.81us/ 8.52ms ( 58.13 GFLOPS, 59.18 GB/s)
*** 48 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 1.32G mem 0.27 GB tm 299.01us/ 8.82ms ( 173.59 GFLOPS, 2.20 GB/s)
*** 49 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 1.37G mem 0.27 GB tm 294.91us/ 9.11ms ( 171.00 GFLOPS, 2.45 GB/s)
*** 50 r_4_2_64_7_7_4_4_4 arg 3 sz [4, 2, 1] [16, 1, 4] OPs 3M/ 1.42G mem 0.27 GB tm 50.94us/ 9.16ms ( 63.04 GFLOPS, 20.30 GB/s)
*** 51 E_8_128_4_4n1 arg 3 sz [1, 8, 1] [128, 1, 1] OPs 0M/ 1.42G mem 0.27 GB tm 15.10us/ 9.18ms ( 1.08 GFLOPS, 6.64 GB/s)
*** 52 r_4_2_128_128_4_4_4 arg 4 sz [2, 2, 4] [64, 1, 1] OPs 17M/ 1.42G mem 0.27 GB tm 97.02us/ 9.28ms ( 175.62 GFLOPS, 6.10 GB/s)
*** 53 r_8_128_4_4_3_3 arg 4 sz [8, 1, 1] [16, 8, 1] OPs 0M/ 1.44G mem 0.27 GB tm 9.98us/ 9.29ms ( 31.18 GFLOPS, 33.95 GB/s)
*** 54 r_8_128_7_4_4_7 arg 4 sz [4, 4, 1] [32, 2, 1] OPs 1M/ 1.44G mem 0.27 GB tm 48.13us/ 9.33ms ( 33.70 GFLOPS, 35.13 GB/s)
*** 55 r_4_2_384_128_4_4_4 arg 4 sz [24, 1, 1] [16, 2, 4] OPs 51M/ 1.44G mem 0.27 GB tm 300.29us/ 9.64ms ( 170.23 GFLOPS, 5.69 GB/s)
*** 56 r_4_2_128_384_4_4_4 arg 6 sz [2, 2, 1] [64, 1, 4] OPs 50M/ 1.49G mem 0.27 GB tm 290.05us/ 9.93ms ( 173.70 GFLOPS, 6.00 GB/s)
*** 57 r_8_128_4_4_3_3 arg 4 sz [8, 1, 1] [16, 8, 1] OPs 0M/ 1.54G mem 0.27 GB tm 15.10us/ 9.94ms ( 20.61 GFLOPS, 22.44 GB/s)
*** 58 r_8_128_7_4_4_7 arg 4 sz [4, 4, 1] [32, 2, 1] OPs 1M/ 1.54G mem 0.27 GB tm 50.94us/ 9.99ms ( 31.84 GFLOPS, 33.19 GB/s)
*** 59 r_4_2_384_128_4_4_4 arg 4 sz [24, 1, 1] [16, 2, 4] OPs 51M/ 1.55G mem 0.27 GB tm 342.02us/ 10.33ms ( 149.46 GFLOPS, 5.00 GB/s)
*** 60 r_4_2_128_384_4_4_4 arg 6 sz [2, 2, 1] [64, 1, 4] OPs 50M/ 1.60G mem 0.27 GB tm 342.02us/ 10.68ms ( 147.31 GFLOPS, 5.09 GB/s)
*** 61 r_4_2_128_3_3_4_4_4 arg 3 sz [8, 1, 1] [16, 2, 4] OPs 1M/ 1.65G mem 0.27 GB tm 29.18us/ 10.70ms ( 40.42 GFLOPS, 20.35 GB/s)
*** 62 r_256_16_4_2 arg 3 sz [256, 1, 1] [4, 16, 1] OPs 0M/ 1.65G mem 0.27 GB tm 52.74us/ 10.76ms ( 1.26 GFLOPS, 2.60 GB/s)
*** 63 r_16_16_4_16_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.65G mem 0.27 GB tm 19.97us/ 10.78ms ( 6.57 GFLOPS, 6.69 GB/s)
*** 64 r_256_16_4_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 0M/ 1.65G mem 0.27 GB tm 29.95us/ 10.81ms ( 4.55 GFLOPS, 4.59 GB/s)
*** 65 r_256_4_32 arg 4 sz [8, 1, 1] [32, 1, 1] OPs 0M/ 1.65G mem 0.27 GB tm 81.92us/ 10.89ms ( 7.21 GFLOPS, 1.70 GB/s)
*** 66 r_512_16_4_16_4 arg 4 sz [512, 1, 1] [4, 16, 1] OPs 4M/ 1.65G mem 0.27 GB tm 254.98us/ 11.14ms ( 16.46 GFLOPS, 16.51 GB/s)
*** 67 r_128_16_4_32_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 2M/ 1.65G mem 0.27 GB tm 156.93us/ 11.30ms ( 13.37 GFLOPS, 13.41 GB/s)
*** 68 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.66G mem 0.27 GB tm 77.06us/ 11.38ms ( 13.63 GFLOPS, 13.70 GB/s)
*** 69 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.66G mem 0.27 GB tm 83.97us/ 11.46ms ( 12.51 GFLOPS, 12.56 GB/s)
*** 70 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.66G mem 0.27 GB tm 66.05us/ 11.53ms ( 15.91 GFLOPS, 15.98 GB/s)
*** 71 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.66G mem 0.27 GB tm 71.17us/ 11.60ms ( 14.76 GFLOPS, 14.82 GB/s)
*** 72 r_128_16_4_8_4 arg 3 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.66G mem 0.27 GB tm 40.96us/ 11.64ms ( 12.80 GFLOPS, 12.85 GB/s)
*** 73 r_16_8_4 arg 3 sz [1, 1, 1] [16, 1, 1] OPs 0M/ 1.66G mem 0.27 GB tm 12.29us/ 11.65ms ( 0.13 GFLOPS, 0.25 GB/s)
*** 74 r_128_16_4_8_4n1 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.66G mem 0.27 GB tm 41.98us/ 11.69ms ( 12.51 GFLOPS, 12.59 GB/s)
*** 75 E_201_4 arg 3 sz [25.125, 1, 1] [8, 1, 1] OPs 0M/ 1.66G mem 0.27 GB tm 5.89us/ 11.70ms ( 0.14 GFLOPS, 1.37 GB/s)
*** 76 r_128_201_4_4 arg 3 sz [8, 1, 1] [16, 1, 1] OPs 0M/ 1.66G mem 0.27 GB tm 81.15us/ 11.78ms ( 10.15 GFLOPS, 10.18 GB/s)
*** 77 E_10_128_4 arg 5 sz [2, 2.5, 1] [64, 4, 1] OPs 0M/ 1.66G mem 0.27 GB tm 7.94us/ 11.79ms ( 7.10 GFLOPS, 2.06 GB/s)
*** 78 r_10_128_128_4_4 arg 5 sz [8, 1.25, 1] [16, 8, 1] OPs 5M/ 1.66G mem 0.27 GB tm 67.07us/ 11.86ms ( 78.40 GFLOPS, 8.46 GB/s)
*** 79 r_10_16_8_4 arg 2 sz [10, 1, 1] [16, 1, 1] OPs 0M/ 1.67G mem 0.27 GB tm 9.98us/ 11.87ms ( 0.51 GFLOPS, 1.03 GB/s)
*** 80 r_10_16_8_4n1 arg 3 sz [10, 1, 1] [16, 1, 1] OPs 0M/ 1.67G mem 0.27 GB tm 9.98us/ 11.88ms ( 1.54 GFLOPS, 1.03 GB/s)
*** 81 E_10_128_4n1 arg 6 sz [64, 1, 1] [2, 10, 1] OPs 0M/ 1.67G mem 0.27 GB tm 7.17us/ 11.88ms ( 2.86 GFLOPS, 3.44 GB/s)
*** 82 r_10_384_128_4_4 arg 3 sz [6, 1, 1] [64, 10, 1] OPs 15M/ 1.67G mem 0.27 GB tm 162.05us/ 12.05ms ( 97.06 GFLOPS, 9.96 GB/s)
*** 83 E_10_128_4n2 arg 3 sz [2, 10, 1] [64, 1, 1] OPs 0M/ 1.68G mem 0.27 GB tm 6.14us/ 12.05ms ( 0.83 GFLOPS, 3.67 GB/s)
*** 84 E_24_16_4_4 arg 3 sz [8, 1.5, 1] [2, 16, 1] OPs 0M/ 1.68G mem 0.27 GB tm 9.98us/ 12.06ms ( 0.62 GFLOPS, 4.92 GB/s)
*** 85 r_10_8_3_16_4_4 arg 3 sz [1.5, 8, 1] [2, 1, 10] OPs 0M/ 1.68G mem 0.27 GB tm 13.06us/ 12.07ms ( 9.41 GFLOPS, 2.02 GB/s)
*** 86 E_8_10_10 arg 2 sz [2.5, 2.5, 2] [4, 4, 4] OPs 0M/ 1.68G mem 0.27 GB tm 5.12us/ 12.08ms ( 0.16 GFLOPS, 1.25 GB/s)
*** 87 r_80_10 arg 2 sz [5, 1, 1] [16, 1, 1] OPs 0M/ 1.68G mem 0.27 GB tm 4.86us/ 12.08ms ( 0.16 GFLOPS, 0.72 GB/s)
*** 88 E_80_10 arg 3 sz [5, 10, 1] [2, 8, 1] OPs 0M/ 1.68G mem 0.27 GB tm 5.89us/ 12.09ms ( 0.41 GFLOPS, 1.14 GB/s)
*** 89 r_80_10n1 arg 2 sz [20, 1, 1] [4, 1, 1] OPs 0M/ 1.68G mem 0.27 GB tm 6.14us/ 12.10ms ( 0.13 GFLOPS, 0.57 GB/s)
*** 90 E_8_10_10n1 arg 3 sz [5, 2.5, 2] [2, 4, 4] OPs 0M/ 1.68G mem 0.27 GB tm 6.14us/ 12.10ms ( 0.13 GFLOPS, 1.09 GB/s)
*** 91 E_10_24_4 arg 2 sz [6, 1.25, 1] [4, 8, 1] OPs 0M/ 1.68G mem 0.27 GB tm 6.91us/ 12.11ms ( 0.14 GFLOPS, 0.83 GB/s)
*** 92 E_128_4_4_3 arg 3 sz [16, 1, 1] [8, 1, 1] OPs 0M/ 1.68G mem 0.27 GB tm 9.98us/ 12.12ms ( 0.62 GFLOPS, 4.92 GB/s)
*** 93 r_10_8_16_3_4_4 arg 3 sz [4, 2, 2.5] [4, 4, 4] OPs 0M/ 1.68G mem 0.27 GB tm 7.17us/ 12.13ms ( 17.14 GFLOPS, 3.41 GB/s)
*** 94 r_10_128_128_4_4n1 arg 5 sz [16, 1, 1] [8, 10, 1] OPs 5M/ 1.68G mem 0.27 GB tm 65.28us/ 12.19ms ( 80.47 GFLOPS, 8.53 GB/s)
*** 95 r_10_16_8_4 arg 2 sz [10, 1, 1] [16, 1, 1] OPs 0M/ 1.69G mem 0.27 GB tm 9.98us/ 12.20ms ( 0.51 GFLOPS, 1.03 GB/s)
*** 96 r_10_16_8_4n1 arg 3 sz [10, 1, 1] [16, 1, 1] OPs 0M/ 1.69G mem 0.27 GB tm 8.96us/ 12.21ms ( 1.72 GFLOPS, 1.15 GB/s)
*** 97 E_10_128_4n1 arg 6 sz [64, 1, 1] [2, 10, 1] OPs 0M/ 1.69G mem 0.27 GB tm 9.73us/ 12.22ms ( 2.11 GFLOPS, 2.53 GB/s)
*** 98 r_10_512_128_4_4 arg 4 sz [8, 1, 1] [64, 10, 1] OPs 21M/ 1.69G mem 0.27 GB tm 251.14us/ 12.47ms ( 84.08 GFLOPS, 8.59 GB/s)
*** 99 r_10_128_512_4_4 arg 3 sz [16, 1, 1] [8, 10, 1] OPs 20M/ 1.71G mem 0.27 GB tm 260.10us/ 12.73ms ( 80.63 GFLOPS, 8.26 GB/s)
*** 100 r_128_4_10 arg 4 sz [32, 1, 1] [4, 1, 1] OPs 0M/ 1.73G mem 0.27 GB tm 7.68us/ 12.74ms ( 2.00 GFLOPS, 3.20 GB/s)
*** 101 E_128_4 arg 2 sz [32, 1, 1] [4, 1, 1] OPs 0M/ 1.73G mem 0.27 GB tm 4.86us/ 12.74ms ( 0.00 GFLOPS, 0.63 GB/s)
*** 102 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.73G mem 0.27 GB tm 75.01us/ 12.82ms ( 14.01 GFLOPS, 14.08 GB/s)
*** 103 r_128_16_4_16_4n1 arg 3 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.73G mem 0.27 GB tm 80.64us/ 12.90ms ( 13.00 GFLOPS, 13.04 GB/s)
*** 104 E_128_4n1 arg 4 sz [32, 1, 1] [4, 1, 1] OPs 0M/ 1.73G mem 0.27 GB tm 5.89us/ 12.91ms ( 0.26 GFLOPS, 1.04 GB/s)
*** 105 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.73G mem 0.27 GB tm 73.98us/ 12.98ms ( 14.20 GFLOPS, 14.27 GB/s)
*** 106 r_128_16_4_16_4n2 arg 7 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.73G mem 0.27 GB tm 84.74us/ 13.06ms ( 12.41 GFLOPS, 12.50 GB/s)
*** 107 r_64_16_4_8_4 arg 4 sz [64, 1, 1] [4, 16, 1] OPs 0M/ 1.73G mem 0.27 GB tm 25.86us/ 13.09ms ( 10.16 GFLOPS, 10.24 GB/s)
*** 108 r_64_16_4_4_4 arg 4 sz [64, 1, 1] [4, 16, 1] OPs 0M/ 1.73G mem 0.27 GB tm 23.04us/ 13.11ms ( 5.71 GFLOPS, 5.78 GB/s)
*** 109 r_64_16_4_4_4n1 arg 5 sz [64, 1, 1] [4, 16, 1] OPs 0M/ 1.73G mem 0.27 GB tm 20.22us/ 13.13ms ( 6.52 GFLOPS, 6.61 GB/s)
*** 110 r_1239_64_4_4 arg 3 sz [19.359375, 1, 1] [64, 1, 1] OPs 2M/ 1.73G mem 0.27 GB tm 104.96us/ 13.24ms ( 24.18 GFLOPS, 24.37 GB/s)
*** 111 r_16_16_4_8_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 13.06us/ 13.25ms ( 5.03 GFLOPS, 5.13 GB/s)
*** 112 r_16_16_4_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.96us/ 13.26ms ( 0.93 GFLOPS, 0.97 GB/s)
*** 113 r_16_16_4_4n1 arg 5 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.70us/ 13.27ms ( 0.96 GFLOPS, 1.01 GB/s)
*** 114 r_132_4_16_4 arg 3 sz [4, 132, 1] [16, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 23.04us/ 13.29ms ( 2.93 GFLOPS, 3.03 GB/s)
*** 115 r_4_16_4_8_4 arg 4 sz [4, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 13.06us/ 13.31ms ( 1.26 GFLOPS, 1.34 GB/s)
*** 116 r_4_4_4_4 arg 4 sz [1, 1, 1] [4, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 7.17us/ 13.31ms ( 0.08 GFLOPS, 0.09 GB/s)
*** 117 r_4_4_4_4n1 arg 5 sz [2, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 6.91us/ 13.32ms ( 0.08 GFLOPS, 0.10 GB/s)
*** 118 r_2_4_4_4 arg 3 sz [4, 1, 1] [1, 2, 1] OPs 0M/ 1.74G mem 0.27 GB tm 5.12us/ 13.33ms ( 0.05 GFLOPS, 0.06 GB/s)
*** 119 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 15.10us/ 13.34ms ( 2.17 GFLOPS, 2.25 GB/s)
*** 120 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.19us/ 13.35ms ( 0.26 GFLOPS, 0.28 GB/s)
*** 121 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 7.17us/ 13.36ms ( 0.30 GFLOPS, 0.33 GB/s)
*** 122 r_66_4_4_8 arg 3 sz [4, 2.0625, 1] [1, 32, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.19us/ 13.36ms ( 2.06 GFLOPS, 2.20 GB/s)
*** 123 r_16_16_4_8_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 14.08us/ 13.38ms ( 4.66 GFLOPS, 4.75 GB/s)
*** 124 r_16_16_4_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 9.73us/ 13.39ms ( 0.86 GFLOPS, 0.89 GB/s)
*** 125 r_16_16_4_4n1 arg 5 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.70us/ 13.40ms ( 0.96 GFLOPS, 1.01 GB/s)
*** 126 r_26_4_16_4 arg 3 sz [4, 26, 1] [16, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 11.78us/ 13.41ms ( 1.13 GFLOPS, 1.18 GB/s)
*** 127 r_4_16_4_8_4 arg 4 sz [4, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 12.03us/ 13.42ms ( 1.36 GFLOPS, 1.45 GB/s)
*** 128 r_4_4_4_4 arg 4 sz [1, 1, 1] [4, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.96us/ 13.43ms ( 0.06 GFLOPS, 0.07 GB/s)
*** 129 r_4_4_4_4n1 arg 5 sz [2, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 10.24us/ 13.44ms ( 0.05 GFLOPS, 0.07 GB/s)
*** 130 r_4_4_4 arg 3 sz [4, 1, 1] [1, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 5.12us/ 13.44ms ( 0.03 GFLOPS, 0.03 GB/s)
*** 131 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 12.03us/ 13.46ms ( 2.73 GFLOPS, 2.82 GB/s)
*** 132 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 11.01us/ 13.47ms ( 0.19 GFLOPS, 0.21 GB/s)
*** 133 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 11.01us/ 13.48ms ( 0.19 GFLOPS, 0.22 GB/s)
*** 134 r_2_4_4_8 arg 3 sz [2, 2, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 5.89us/ 13.48ms ( 0.09 GFLOPS, 0.10 GB/s)
*** 135 r_128_16_4_32_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 2M/ 1.74G mem 0.27 GB tm 153.86us/ 13.64ms ( 13.64 GFLOPS, 13.68 GB/s)
*** 136 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.74G mem 0.27 GB tm 59.90us/ 13.70ms ( 17.54 GFLOPS, 17.62 GB/s)
*** 137 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.74G mem 0.27 GB tm 66.82us/ 13.76ms ( 15.72 GFLOPS, 15.79 GB/s)
*** 138 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.74G mem 0.27 GB tm 58.11us/ 13.82ms ( 18.08 GFLOPS, 18.17 GB/s)
*** 139 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.74G mem 0.27 GB tm 67.84us/ 13.89ms ( 15.48 GFLOPS, 15.55 GB/s)
*** 140 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.74G mem 0.27 GB tm 59.14us/ 13.95ms ( 17.77 GFLOPS, 17.85 GB/s)
*** 141 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.75G mem 0.27 GB tm 67.84us/ 14.02ms ( 15.48 GFLOPS, 15.55 GB/s)
*** 142 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.75G mem 0.27 GB tm 57.86us/ 14.08ms ( 18.16 GFLOPS, 18.25 GB/s)
*** 143 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.75G mem 0.27 GB tm 68.86us/ 14.14ms ( 15.25 GFLOPS, 15.32 GB/s)
*** 144 r_16_16_4_8_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 13.06us/ 14.16ms ( 5.03 GFLOPS, 5.13 GB/s)
*** 145 r_16_16_4_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.17ms ( 0.76 GFLOPS, 0.79 GB/s)
*** 146 r_16_16_4_4n1 arg 5 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.98us/ 14.18ms ( 0.84 GFLOPS, 0.88 GB/s)
*** 147 r_12_4_16_4 arg 3 sz [4, 12, 1] [16, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.98us/ 14.19ms ( 0.62 GFLOPS, 0.65 GB/s)
*** 148 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 12.03us/ 14.20ms ( 2.73 GFLOPS, 2.82 GB/s)
*** 149 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.21ms ( 0.19 GFLOPS, 0.21 GB/s)
*** 150 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.98us/ 14.22ms ( 0.21 GFLOPS, 0.24 GB/s)
*** 151 r_8_4_4_8 arg 3 sz [4, 4, 1] [1, 2, 1] OPs 0M/ 1.75G mem 0.27 GB tm 6.91us/ 14.23ms ( 0.30 GFLOPS, 0.32 GB/s)
*** 152 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 13.06us/ 14.24ms ( 2.51 GFLOPS, 2.60 GB/s)
*** 153 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.25ms ( 0.19 GFLOPS, 0.21 GB/s)
*** 154 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.26ms ( 0.19 GFLOPS, 0.22 GB/s)
*** 155 r_3_4_4_8 arg 3 sz [4, 3, 1] [1, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.22us/ 14.27ms ( 0.08 GFLOPS, 0.10 GB/s)
*** 156 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 12.03us/ 14.28ms ( 2.73 GFLOPS, 2.82 GB/s)
*** 157 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.98us/ 14.29ms ( 0.21 GFLOPS, 0.23 GB/s)
*** 158 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.31ms ( 0.19 GFLOPS, 0.22 GB/s)
*** 159 r_2_4_4_8n1 arg 3 sz [4, 1, 1] [1, 2, 1] OPs 0M/ 1.75G mem 0.27 GB tm 6.14us/ 14.31ms ( 0.08 GFLOPS, 0.10 GB/s)
*** 160 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.78us/ 14.32ms ( 2.79 GFLOPS, 2.89 GB/s)
*** 161 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.33ms ( 0.19 GFLOPS, 0.21 GB/s)
*** 162 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.35ms ( 0.19 GFLOPS, 0.22 GB/s)
*** 163 r_3_4_4_8 arg 3 sz [4, 3, 1] [1, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.98us/ 14.36ms ( 0.08 GFLOPS, 0.09 GB/s)
*** 164 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 12.80us/ 14.37ms ( 2.56 GFLOPS, 2.65 GB/s)
*** 165 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.38ms ( 0.19 GFLOPS, 0.21 GB/s)
*** 166 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.26us/ 14.39ms ( 0.19 GFLOPS, 0.21 GB/s)
*** 167 r_3_4_4_8 arg 3 sz [4, 3, 1] [1, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 10.24us/ 14.40ms ( 0.07 GFLOPS, 0.09 GB/s)
*** 168 r_10_128_512_4_4n1 arg 3 sz [8, 1.25, 1] [16, 8, 1] OPs 20M/ 1.75G mem 0.27 GB tm 256.00us/ 14.66ms ( 81.92 GFLOPS, 8.39 GB/s)
*** 169 r_128_4_10n1 arg 4 sz [32, 1, 1] [4, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 8.70us/ 14.67ms ( 1.76 GFLOPS, 2.82 GB/s)
*** 170 E_129_4 arg 4 sz [8.0625, 1, 1] [16, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 5.89us/ 14.67ms ( 0.18 GFLOPS, 0.88 GB/s)
*** 171 r_32_129_4_4 arg 4 sz [8, 1, 1] [4, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 49.92us/ 14.72ms ( 2.65 GFLOPS, 2.68 GB/s)
*** 172 r_128_16_4_2_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 23.04us/ 14.74ms ( 5.73 GFLOPS, 5.83 GB/s)
*** 173 r_32_16_4_8_4 arg 5 sz [32, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 19.71us/ 14.76ms ( 6.67 GFLOPS, 6.75 GB/s)
*** 174 r_128_16_4_2_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 22.02us/ 14.79ms ( 6.00 GFLOPS, 6.10 GB/s)
*** 175 r_32_16_4_8_4 arg 5 sz [32, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 16.90us/ 14.80ms ( 7.78 GFLOPS, 7.88 GB/s)
*** 176 r_128_16_4_2_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 22.02us/ 14.83ms ( 6.00 GFLOPS, 6.10 GB/s)
*** 177 r_32_16_4_8_4 arg 5 sz [32, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 18.18us/ 14.84ms ( 7.23 GFLOPS, 7.32 GB/s)
*** 178 r_128_16_4_2_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 22.27us/ 14.87ms ( 5.93 GFLOPS, 6.03 GB/s)
*** 179 r_32_16_4_8_4 arg 5 sz [32, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 17.15us/ 14.88ms ( 7.66 GFLOPS, 7.76 GB/s)
*** 180 r_4_16_2_4 arg 3 sz [4, 1, 1] [16, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 6.91us/ 14.89ms ( 0.15 GFLOPS, 0.19 GB/s)
*** 181 E_128_4n2 arg 4 sz [16, 1, 1] [8, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 5.89us/ 14.90ms ( 0.17 GFLOPS, 0.87 GB/s)
*** 182 E_1626_4 arg 31 sz [25.40625, 1, 1] [64, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 51.97us/ 14.95ms ( 3.63 GFLOPS, 15.52 GB/s)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment