Created
September 19, 2024 01:38
-
-
Save geohot/44b7a2af494bdaa213b0715b1799dd00 to your computer and use it in GitHub Desktop.
tinygrad 0.7 openpilot 0.9.7 run
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
comma@tiny24:/data/openpilot/tinygrad_repo$ python3 openpilot/compile2.py https://github.com/commaai/openpilot/raw/v0.9.7/selfdrive/modeld/models/supercombo.onnx | |
https://github.com/commaai/openpilot/raw/v0.9.7/selfdrive/modeld/models/supercombo.onnx: 100%|███████████████████████████████████████████| 51.5M/51.5M [00:00<00:00, 88.2MB/s] | |
cache is out of date, clearing it | |
/usr/local/pyenv/versions/3.11.4/lib/python3.11/site-packages/pyopencl/__init__.py:528: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more. | |
lambda: self._prg.build(options_bytes, devices), | |
190 schedule items depend on the input, 462 don't | |
7 inputs | |
13: rewrite input, image dtype dtypes.imageh((16, 2048, 4)), (View(shape=(1, 16, 32, 64, 2), strides=(0, 8192, 256, 4, 1), offset=0, mask=None, contiguous=False), View(shape=(1, 16, 32, 128), strides=(0, 4096, 128, 1), offset=0, mask=None, contiguous=True)) | |
24: rewrite input, image dtype dtypes.imageh((8, 2048, 4)), (View(shape=(1, 8, 16, 128, 2), strides=(0, 8192, 512, 4, 1), offset=0, mask=None, contiguous=False), View(shape=(1, 8, 16, 256), strides=(0, 4096, 256, 1), offset=0, mask=None, contiguous=True)) | |
51: rewrite input, image dtype dtypes.imageh((4, 2048, 4)), (View(shape=(1, 4, 8, 256, 2), strides=(0, 8192, 1024, 4, 1), offset=0, mask=None, contiguous=False), View(shape=(1, 4, 8, 512), strides=(0, 4096, 512, 1), offset=0, mask=None, contiguous=True)) | |
62: rewrite input, image dtype dtypes.imageh((4, 4096, 4)), (View(shape=(1, 4, 8, 512, 2), strides=(0, 16384, 2048, 4, 1), offset=0, mask=None, contiguous=False), View(shape=(1, 4, 8, 1024), strides=(0, 8192, 1024, 1), offset=0, mask=None, contiguous=True)) | |
73: rewrite output, output shape 1, image dtype dtypes.imageh((1, 128, 4)) prod 512 | |
79: rewrite output, output shape 10, image dtype dtypes.imageh((10, 128, 4)) prod 5120 | |
80: rewrite output, output shape 10, image dtype dtypes.imageh((10, 128, 4)) prod 5120 | |
86: rewrite output, output shape 800, image dtype dtypes.imageh((10, 24, 4)) prod 960 | |
86: rewrite input, image dtype dtypes.imageh((10, 24, 4)), (View(shape=(1, 8, 10, 10), strides=(0, 12, 96, 1), offset=0, mask=None, contiguous=False),) | |
87: rewrite output, output shape 80, image dtype dtypes.imageh((10, 24, 4)) prod 960 | |
88: rewrite output, output shape 800, image dtype dtypes.imageh((10, 24, 4)) prod 960 | |
89: rewrite output, output shape 80, image dtype dtypes.imageh((10, 24, 4)) prod 960 | |
90: rewrite output, output shape 800, image dtype dtypes.imageh((10, 24, 4)) prod 960 | |
95: rewrite output, output shape 10, image dtype dtypes.imageh((10, 128, 4)) prod 5120 | |
96: rewrite output, output shape 10, image dtype dtypes.imageh((10, 128, 4)) prod 5120 | |
100: rewrite output, output shape 512, image dtype dtypes.imageh((10, 128, 4)) prod 5120 | |
169: rewrite output, output shape 512, image dtype dtypes.imageh((10, 128, 4)) prod 5120 | |
182: rewrite input, image dtype dtypes.imageh((1, 1239, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=0, mask=((0, 1), (0, 4955)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 132, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-4955, mask=((0, 1), (4955, 5483)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 2, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5483, mask=((0, 1), (5483, 5491)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 66, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5491, mask=((0, 1), (5491, 5755)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 26, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5755, mask=((0, 1), (5755, 5857)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 1, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5857, mask=((0, 1), (5857, 5860)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 2, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5860, mask=((0, 1), (5860, 5868)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 12, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5868, mask=((0, 1), (5868, 5916)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 8, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5916, mask=((0, 1), (5916, 5948)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 3, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5948, mask=((0, 1), (5948, 5960)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 2, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5960, mask=((0, 1), (5960, 5966)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 3, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5966, mask=((0, 1), (5966, 5978)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 3, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5978, mask=((0, 1), (5978, 5990)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 1, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5990, mask=((0, 1), (5990, 5992)), contiguous=False),) | |
182: rewrite input, image dtype dtypes.imageh((1, 128, 4)), (View(shape=(1, 6504), strides=(0, 1), offset=-5992, mask=((0, 1), (5992, 6504)), contiguous=False),) | |
**** running real kernels 150/183 images **** | |
*** 0 E_32768_4_6 arg 3 sz [512, 1, 1] [64, 1, 1] OPs 0M/ 0.00G mem 0.26 GB tm 272.90us/ 0.27ms ( 2.88 GFLOPS, 28.82 GB/s) | |
*** 1 r_64_32_16_6_3_3_4_4_4 arg 4 sz [2, 1, 64] [8, 32, 1] OPs 234M/ 0.00G mem 0.26 GB tm 1244.16us/ 1.52ms ( 188.79 GFLOPS, 3.71 GB/s) | |
*** 2 r_32_16_16_4_4_3_3 arg 4 sz [2, 1, 8] [8, 16, 4] OPs 4M/ 0.24G mem 0.26 GB tm 88.83us/ 1.61ms ( 50.17 GFLOPS, 29.53 GB/s) | |
*** 3 r_32_16_16_16_4_4_4 arg 4 sz [2, 2, 16] [8, 8, 2] OPs 18M/ 0.24G mem 0.26 GB tm 101.12us/ 1.71ms ( 186.65 GFLOPS, 5.27 GB/s) | |
*** 4 r_32_16_16_4_4_3_3n1 arg 4 sz [4, 4, 4] [4, 4, 8] OPs 2M/ 0.26G mem 0.26 GB tm 42.75us/ 1.75ms ( 58.25 GFLOPS, 61.35 GB/s) | |
*** 5 r_32_16_16_7_4_4_7 arg 4 sz [4, 4, 1] [4, 4, 32] OPs 12M/ 0.26G mem 0.26 GB tm 160.77us/ 1.91ms ( 80.71 GFLOPS, 81.57 GB/s) | |
*** 6 r_32_16_48_16_4_4_4 arg 4 sz [3, 4, 4] [16, 4, 8] OPs 56M/ 0.27G mem 0.26 GB tm 295.17us/ 2.21ms ( 191.83 GFLOPS, 3.64 GB/s) | |
*** 7 r_32_16_16_48_4_4_4 arg 6 sz [2, 1, 32] [8, 16, 1] OPs 50M/ 0.33G mem 0.26 GB tm 256.00us/ 2.46ms ( 198.14 GFLOPS, 5.22 GB/s) | |
*** 8 r_32_16_16_4_4_3_3n1 arg 4 sz [4, 4, 4] [4, 4, 8] OPs 2M/ 0.38G mem 0.26 GB tm 53.76us/ 2.52ms ( 46.32 GFLOPS, 48.79 GB/s) | |
*** 9 r_32_16_16_7_4_4_7 arg 4 sz [4, 4, 1] [4, 4, 32] OPs 12M/ 0.38G mem 0.26 GB tm 168.96us/ 2.68ms ( 76.80 GFLOPS, 77.61 GB/s) | |
*** 10 r_32_16_48_16_4_4_4 arg 4 sz [3, 4, 4] [16, 4, 8] OPs 56M/ 0.40G mem 0.27 GB tm 302.85us/ 2.99ms ( 186.97 GFLOPS, 3.55 GB/s) | |
*** 11 r_32_16_16_48_4_4_4 arg 6 sz [2, 1, 32] [8, 16, 1] OPs 50M/ 0.45G mem 0.27 GB tm 275.97us/ 3.26ms ( 183.81 GFLOPS, 4.84 GB/s) | |
*** 12 r_16_8_16_7_7_4_4_4 arg 3 sz [4, 1, 4] [4, 8, 4] OPs 12M/ 0.50G mem 0.27 GB tm 168.19us/ 3.43ms ( 76.37 GFLOPS, 22.36 GB/s) | |
*** 13 E_128_32_4_4n1 arg 3 sz [2, 128, 1] [16, 1, 1] OPs 0M/ 0.52G mem 0.27 GB tm 43.01us/ 3.47ms ( 1.52 GFLOPS, 9.15 GB/s) | |
*** 14 r_16_8_32_32_4_4_4 arg 4 sz [4, 1, 8] [8, 8, 2] OPs 17M/ 0.52G mem 0.27 GB tm 96.77us/ 3.57ms ( 184.21 GFLOPS, 3.05 GB/s) | |
*** 15 r_16_8_32_4_4_3_3 arg 4 sz [4, 2, 4] [8, 4, 4] OPs 1M/ 0.54G mem 0.27 GB tm 25.86us/ 3.60ms ( 48.16 GFLOPS, 50.80 GB/s) | |
*** 16 r_16_8_32_7_4_4_7 arg 4 sz [8, 1, 1] [4, 8, 16] OPs 6M/ 0.54G mem 0.27 GB tm 87.04us/ 3.68ms ( 74.54 GFLOPS, 75.44 GB/s) | |
*** 17 r_16_8_96_32_4_4_4 arg 4 sz [6, 2, 2] [16, 4, 8] OPs 53M/ 0.54G mem 0.27 GB tm 275.97us/ 3.96ms ( 193.78 GFLOPS, 2.26 GB/s) | |
*** 18 r_16_8_32_96_4_4_4 arg 6 sz [2, 1, 2] [16, 8, 8] OPs 50M/ 0.60G mem 0.27 GB tm 270.08us/ 4.23ms ( 187.09 GFLOPS, 2.79 GB/s) | |
*** 19 r_16_8_32_4_4_3_3 arg 4 sz [4, 2, 4] [8, 4, 4] OPs 1M/ 0.65G mem 0.27 GB tm 32.00us/ 4.26ms ( 38.91 GFLOPS, 41.05 GB/s) | |
*** 20 r_16_8_32_7_4_4_7 arg 4 sz [8, 1, 1] [4, 8, 16] OPs 6M/ 0.65G mem 0.27 GB tm 97.02us/ 4.36ms ( 66.87 GFLOPS, 67.68 GB/s) | |
*** 21 r_16_8_96_32_4_4_4 arg 4 sz [6, 2, 2] [16, 4, 8] OPs 53M/ 0.65G mem 0.27 GB tm 288.77us/ 4.65ms ( 185.19 GFLOPS, 2.16 GB/s) | |
*** 22 r_16_8_32_96_4_4_4 arg 6 sz [2, 1, 2] [16, 8, 8] OPs 50M/ 0.71G mem 0.27 GB tm 292.10us/ 4.94ms ( 172.99 GFLOPS, 2.58 GB/s) | |
*** 23 r_8_4_32_7_7_4_4_4 arg 3 sz [2, 4, 2] [16, 1, 4] OPs 6M/ 0.76G mem 0.27 GB tm 88.32us/ 5.03ms ( 72.72 GFLOPS, 21.72 GB/s) | |
*** 24 E_32_64_4_4 arg 3 sz [4, 32, 1] [16, 1, 1] OPs 0M/ 0.77G mem 0.27 GB tm 20.99us/ 5.05ms ( 1.56 GFLOPS, 9.41 GB/s) | |
*** 25 r_8_4_64_64_4_4_4 arg 4 sz [4, 4, 2] [16, 1, 4] OPs 17M/ 0.77G mem 0.27 GB tm 104.96us/ 5.15ms ( 164.84 GFLOPS, 2.51 GB/s) | |
*** 26 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 0.78G mem 0.27 GB tm 16.13us/ 5.17ms ( 38.60 GFLOPS, 40.98 GB/s) | |
*** 27 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 0.78G mem 0.27 GB tm 49.15us/ 5.22ms ( 66.00 GFLOPS, 67.20 GB/s) | |
*** 28 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 0.79G mem 0.27 GB tm 275.97us/ 5.50ms ( 188.08 GFLOPS, 2.39 GB/s) | |
*** 29 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 0.84G mem 0.27 GB tm 275.20us/ 5.77ms ( 183.25 GFLOPS, 2.63 GB/s) | |
*** 30 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 0.89G mem 0.27 GB tm 23.81us/ 5.79ms ( 26.15 GFLOPS, 27.76 GB/s) | |
*** 31 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 0.89G mem 0.27 GB tm 57.09us/ 5.85ms ( 56.83 GFLOPS, 57.86 GB/s) | |
*** 32 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 0.89G mem 0.27 GB tm 297.98us/ 6.15ms ( 174.19 GFLOPS, 2.21 GB/s) | |
*** 33 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 0.94G mem 0.27 GB tm 292.10us/ 6.44ms ( 172.65 GFLOPS, 2.48 GB/s) | |
*** 34 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 1.00G mem 0.27 GB tm 19.97us/ 6.46ms ( 31.18 GFLOPS, 33.10 GB/s) | |
*** 35 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 1.00G mem 0.27 GB tm 53.76us/ 6.52ms ( 60.34 GFLOPS, 61.44 GB/s) | |
*** 36 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 1.00G mem 0.27 GB tm 296.96us/ 6.81ms ( 174.79 GFLOPS, 2.22 GB/s) | |
*** 37 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 1.05G mem 0.27 GB tm 293.12us/ 7.11ms ( 172.05 GFLOPS, 2.47 GB/s) | |
*** 38 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 1.10G mem 0.27 GB tm 18.94us/ 7.12ms ( 32.86 GFLOPS, 34.89 GB/s) | |
*** 39 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 1.10G mem 0.27 GB tm 53.76us/ 7.18ms ( 60.34 GFLOPS, 61.44 GB/s) | |
*** 40 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 1.11G mem 0.27 GB tm 297.98us/ 7.48ms ( 174.19 GFLOPS, 2.21 GB/s) | |
*** 41 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 1.16G mem 0.27 GB tm 295.17us/ 7.77ms ( 170.85 GFLOPS, 2.45 GB/s) | |
*** 42 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 1.21G mem 0.27 GB tm 20.99us/ 7.79ms ( 29.66 GFLOPS, 31.49 GB/s) | |
*** 43 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 1.21G mem 0.27 GB tm 54.78us/ 7.85ms ( 59.21 GFLOPS, 60.29 GB/s) | |
*** 44 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 1.21G mem 0.27 GB tm 297.98us/ 8.15ms ( 174.19 GFLOPS, 2.21 GB/s) | |
*** 45 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 1.26G mem 0.27 GB tm 295.94us/ 8.44ms ( 170.41 GFLOPS, 2.44 GB/s) | |
*** 46 r_8_4_64_4_4_3_3 arg 4 sz [8, 1, 4] [8, 4, 2] OPs 0M/ 1.31G mem 0.27 GB tm 22.78us/ 8.46ms ( 27.33 GFLOPS, 29.01 GB/s) | |
*** 47 r_8_4_64_7_4_4_7 arg 4 sz [4, 2, 4] [16, 2, 2] OPs 3M/ 1.31G mem 0.27 GB tm 55.81us/ 8.52ms ( 58.13 GFLOPS, 59.18 GB/s) | |
*** 48 r_8_4_192_64_4_4_4 arg 4 sz [24, 1, 1] [8, 4, 8] OPs 51M/ 1.32G mem 0.27 GB tm 299.01us/ 8.82ms ( 173.59 GFLOPS, 2.20 GB/s) | |
*** 49 r_8_4_64_192_4_4_4 arg 6 sz [4, 4, 1] [16, 1, 8] OPs 50M/ 1.37G mem 0.27 GB tm 294.91us/ 9.11ms ( 171.00 GFLOPS, 2.45 GB/s) | |
*** 50 r_4_2_64_7_7_4_4_4 arg 3 sz [4, 2, 1] [16, 1, 4] OPs 3M/ 1.42G mem 0.27 GB tm 50.94us/ 9.16ms ( 63.04 GFLOPS, 20.30 GB/s) | |
*** 51 E_8_128_4_4n1 arg 3 sz [1, 8, 1] [128, 1, 1] OPs 0M/ 1.42G mem 0.27 GB tm 15.10us/ 9.18ms ( 1.08 GFLOPS, 6.64 GB/s) | |
*** 52 r_4_2_128_128_4_4_4 arg 4 sz [2, 2, 4] [64, 1, 1] OPs 17M/ 1.42G mem 0.27 GB tm 97.02us/ 9.28ms ( 175.62 GFLOPS, 6.10 GB/s) | |
*** 53 r_8_128_4_4_3_3 arg 4 sz [8, 1, 1] [16, 8, 1] OPs 0M/ 1.44G mem 0.27 GB tm 9.98us/ 9.29ms ( 31.18 GFLOPS, 33.95 GB/s) | |
*** 54 r_8_128_7_4_4_7 arg 4 sz [4, 4, 1] [32, 2, 1] OPs 1M/ 1.44G mem 0.27 GB tm 48.13us/ 9.33ms ( 33.70 GFLOPS, 35.13 GB/s) | |
*** 55 r_4_2_384_128_4_4_4 arg 4 sz [24, 1, 1] [16, 2, 4] OPs 51M/ 1.44G mem 0.27 GB tm 300.29us/ 9.64ms ( 170.23 GFLOPS, 5.69 GB/s) | |
*** 56 r_4_2_128_384_4_4_4 arg 6 sz [2, 2, 1] [64, 1, 4] OPs 50M/ 1.49G mem 0.27 GB tm 290.05us/ 9.93ms ( 173.70 GFLOPS, 6.00 GB/s) | |
*** 57 r_8_128_4_4_3_3 arg 4 sz [8, 1, 1] [16, 8, 1] OPs 0M/ 1.54G mem 0.27 GB tm 15.10us/ 9.94ms ( 20.61 GFLOPS, 22.44 GB/s) | |
*** 58 r_8_128_7_4_4_7 arg 4 sz [4, 4, 1] [32, 2, 1] OPs 1M/ 1.54G mem 0.27 GB tm 50.94us/ 9.99ms ( 31.84 GFLOPS, 33.19 GB/s) | |
*** 59 r_4_2_384_128_4_4_4 arg 4 sz [24, 1, 1] [16, 2, 4] OPs 51M/ 1.55G mem 0.27 GB tm 342.02us/ 10.33ms ( 149.46 GFLOPS, 5.00 GB/s) | |
*** 60 r_4_2_128_384_4_4_4 arg 6 sz [2, 2, 1] [64, 1, 4] OPs 50M/ 1.60G mem 0.27 GB tm 342.02us/ 10.68ms ( 147.31 GFLOPS, 5.09 GB/s) | |
*** 61 r_4_2_128_3_3_4_4_4 arg 3 sz [8, 1, 1] [16, 2, 4] OPs 1M/ 1.65G mem 0.27 GB tm 29.18us/ 10.70ms ( 40.42 GFLOPS, 20.35 GB/s) | |
*** 62 r_256_16_4_2 arg 3 sz [256, 1, 1] [4, 16, 1] OPs 0M/ 1.65G mem 0.27 GB tm 52.74us/ 10.76ms ( 1.26 GFLOPS, 2.60 GB/s) | |
*** 63 r_16_16_4_16_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.65G mem 0.27 GB tm 19.97us/ 10.78ms ( 6.57 GFLOPS, 6.69 GB/s) | |
*** 64 r_256_16_4_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 0M/ 1.65G mem 0.27 GB tm 29.95us/ 10.81ms ( 4.55 GFLOPS, 4.59 GB/s) | |
*** 65 r_256_4_32 arg 4 sz [8, 1, 1] [32, 1, 1] OPs 0M/ 1.65G mem 0.27 GB tm 81.92us/ 10.89ms ( 7.21 GFLOPS, 1.70 GB/s) | |
*** 66 r_512_16_4_16_4 arg 4 sz [512, 1, 1] [4, 16, 1] OPs 4M/ 1.65G mem 0.27 GB tm 254.98us/ 11.14ms ( 16.46 GFLOPS, 16.51 GB/s) | |
*** 67 r_128_16_4_32_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 2M/ 1.65G mem 0.27 GB tm 156.93us/ 11.30ms ( 13.37 GFLOPS, 13.41 GB/s) | |
*** 68 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.66G mem 0.27 GB tm 77.06us/ 11.38ms ( 13.63 GFLOPS, 13.70 GB/s) | |
*** 69 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.66G mem 0.27 GB tm 83.97us/ 11.46ms ( 12.51 GFLOPS, 12.56 GB/s) | |
*** 70 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.66G mem 0.27 GB tm 66.05us/ 11.53ms ( 15.91 GFLOPS, 15.98 GB/s) | |
*** 71 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.66G mem 0.27 GB tm 71.17us/ 11.60ms ( 14.76 GFLOPS, 14.82 GB/s) | |
*** 72 r_128_16_4_8_4 arg 3 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.66G mem 0.27 GB tm 40.96us/ 11.64ms ( 12.80 GFLOPS, 12.85 GB/s) | |
*** 73 r_16_8_4 arg 3 sz [1, 1, 1] [16, 1, 1] OPs 0M/ 1.66G mem 0.27 GB tm 12.29us/ 11.65ms ( 0.13 GFLOPS, 0.25 GB/s) | |
*** 74 r_128_16_4_8_4n1 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.66G mem 0.27 GB tm 41.98us/ 11.69ms ( 12.51 GFLOPS, 12.59 GB/s) | |
*** 75 E_201_4 arg 3 sz [25.125, 1, 1] [8, 1, 1] OPs 0M/ 1.66G mem 0.27 GB tm 5.89us/ 11.70ms ( 0.14 GFLOPS, 1.37 GB/s) | |
*** 76 r_128_201_4_4 arg 3 sz [8, 1, 1] [16, 1, 1] OPs 0M/ 1.66G mem 0.27 GB tm 81.15us/ 11.78ms ( 10.15 GFLOPS, 10.18 GB/s) | |
*** 77 E_10_128_4 arg 5 sz [2, 2.5, 1] [64, 4, 1] OPs 0M/ 1.66G mem 0.27 GB tm 7.94us/ 11.79ms ( 7.10 GFLOPS, 2.06 GB/s) | |
*** 78 r_10_128_128_4_4 arg 5 sz [8, 1.25, 1] [16, 8, 1] OPs 5M/ 1.66G mem 0.27 GB tm 67.07us/ 11.86ms ( 78.40 GFLOPS, 8.46 GB/s) | |
*** 79 r_10_16_8_4 arg 2 sz [10, 1, 1] [16, 1, 1] OPs 0M/ 1.67G mem 0.27 GB tm 9.98us/ 11.87ms ( 0.51 GFLOPS, 1.03 GB/s) | |
*** 80 r_10_16_8_4n1 arg 3 sz [10, 1, 1] [16, 1, 1] OPs 0M/ 1.67G mem 0.27 GB tm 9.98us/ 11.88ms ( 1.54 GFLOPS, 1.03 GB/s) | |
*** 81 E_10_128_4n1 arg 6 sz [64, 1, 1] [2, 10, 1] OPs 0M/ 1.67G mem 0.27 GB tm 7.17us/ 11.88ms ( 2.86 GFLOPS, 3.44 GB/s) | |
*** 82 r_10_384_128_4_4 arg 3 sz [6, 1, 1] [64, 10, 1] OPs 15M/ 1.67G mem 0.27 GB tm 162.05us/ 12.05ms ( 97.06 GFLOPS, 9.96 GB/s) | |
*** 83 E_10_128_4n2 arg 3 sz [2, 10, 1] [64, 1, 1] OPs 0M/ 1.68G mem 0.27 GB tm 6.14us/ 12.05ms ( 0.83 GFLOPS, 3.67 GB/s) | |
*** 84 E_24_16_4_4 arg 3 sz [8, 1.5, 1] [2, 16, 1] OPs 0M/ 1.68G mem 0.27 GB tm 9.98us/ 12.06ms ( 0.62 GFLOPS, 4.92 GB/s) | |
*** 85 r_10_8_3_16_4_4 arg 3 sz [1.5, 8, 1] [2, 1, 10] OPs 0M/ 1.68G mem 0.27 GB tm 13.06us/ 12.07ms ( 9.41 GFLOPS, 2.02 GB/s) | |
*** 86 E_8_10_10 arg 2 sz [2.5, 2.5, 2] [4, 4, 4] OPs 0M/ 1.68G mem 0.27 GB tm 5.12us/ 12.08ms ( 0.16 GFLOPS, 1.25 GB/s) | |
*** 87 r_80_10 arg 2 sz [5, 1, 1] [16, 1, 1] OPs 0M/ 1.68G mem 0.27 GB tm 4.86us/ 12.08ms ( 0.16 GFLOPS, 0.72 GB/s) | |
*** 88 E_80_10 arg 3 sz [5, 10, 1] [2, 8, 1] OPs 0M/ 1.68G mem 0.27 GB tm 5.89us/ 12.09ms ( 0.41 GFLOPS, 1.14 GB/s) | |
*** 89 r_80_10n1 arg 2 sz [20, 1, 1] [4, 1, 1] OPs 0M/ 1.68G mem 0.27 GB tm 6.14us/ 12.10ms ( 0.13 GFLOPS, 0.57 GB/s) | |
*** 90 E_8_10_10n1 arg 3 sz [5, 2.5, 2] [2, 4, 4] OPs 0M/ 1.68G mem 0.27 GB tm 6.14us/ 12.10ms ( 0.13 GFLOPS, 1.09 GB/s) | |
*** 91 E_10_24_4 arg 2 sz [6, 1.25, 1] [4, 8, 1] OPs 0M/ 1.68G mem 0.27 GB tm 6.91us/ 12.11ms ( 0.14 GFLOPS, 0.83 GB/s) | |
*** 92 E_128_4_4_3 arg 3 sz [16, 1, 1] [8, 1, 1] OPs 0M/ 1.68G mem 0.27 GB tm 9.98us/ 12.12ms ( 0.62 GFLOPS, 4.92 GB/s) | |
*** 93 r_10_8_16_3_4_4 arg 3 sz [4, 2, 2.5] [4, 4, 4] OPs 0M/ 1.68G mem 0.27 GB tm 7.17us/ 12.13ms ( 17.14 GFLOPS, 3.41 GB/s) | |
*** 94 r_10_128_128_4_4n1 arg 5 sz [16, 1, 1] [8, 10, 1] OPs 5M/ 1.68G mem 0.27 GB tm 65.28us/ 12.19ms ( 80.47 GFLOPS, 8.53 GB/s) | |
*** 95 r_10_16_8_4 arg 2 sz [10, 1, 1] [16, 1, 1] OPs 0M/ 1.69G mem 0.27 GB tm 9.98us/ 12.20ms ( 0.51 GFLOPS, 1.03 GB/s) | |
*** 96 r_10_16_8_4n1 arg 3 sz [10, 1, 1] [16, 1, 1] OPs 0M/ 1.69G mem 0.27 GB tm 8.96us/ 12.21ms ( 1.72 GFLOPS, 1.15 GB/s) | |
*** 97 E_10_128_4n1 arg 6 sz [64, 1, 1] [2, 10, 1] OPs 0M/ 1.69G mem 0.27 GB tm 9.73us/ 12.22ms ( 2.11 GFLOPS, 2.53 GB/s) | |
*** 98 r_10_512_128_4_4 arg 4 sz [8, 1, 1] [64, 10, 1] OPs 21M/ 1.69G mem 0.27 GB tm 251.14us/ 12.47ms ( 84.08 GFLOPS, 8.59 GB/s) | |
*** 99 r_10_128_512_4_4 arg 3 sz [16, 1, 1] [8, 10, 1] OPs 20M/ 1.71G mem 0.27 GB tm 260.10us/ 12.73ms ( 80.63 GFLOPS, 8.26 GB/s) | |
*** 100 r_128_4_10 arg 4 sz [32, 1, 1] [4, 1, 1] OPs 0M/ 1.73G mem 0.27 GB tm 7.68us/ 12.74ms ( 2.00 GFLOPS, 3.20 GB/s) | |
*** 101 E_128_4 arg 2 sz [32, 1, 1] [4, 1, 1] OPs 0M/ 1.73G mem 0.27 GB tm 4.86us/ 12.74ms ( 0.00 GFLOPS, 0.63 GB/s) | |
*** 102 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.73G mem 0.27 GB tm 75.01us/ 12.82ms ( 14.01 GFLOPS, 14.08 GB/s) | |
*** 103 r_128_16_4_16_4n1 arg 3 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.73G mem 0.27 GB tm 80.64us/ 12.90ms ( 13.00 GFLOPS, 13.04 GB/s) | |
*** 104 E_128_4n1 arg 4 sz [32, 1, 1] [4, 1, 1] OPs 0M/ 1.73G mem 0.27 GB tm 5.89us/ 12.91ms ( 0.26 GFLOPS, 1.04 GB/s) | |
*** 105 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.73G mem 0.27 GB tm 73.98us/ 12.98ms ( 14.20 GFLOPS, 14.27 GB/s) | |
*** 106 r_128_16_4_16_4n2 arg 7 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.73G mem 0.27 GB tm 84.74us/ 13.06ms ( 12.41 GFLOPS, 12.50 GB/s) | |
*** 107 r_64_16_4_8_4 arg 4 sz [64, 1, 1] [4, 16, 1] OPs 0M/ 1.73G mem 0.27 GB tm 25.86us/ 13.09ms ( 10.16 GFLOPS, 10.24 GB/s) | |
*** 108 r_64_16_4_4_4 arg 4 sz [64, 1, 1] [4, 16, 1] OPs 0M/ 1.73G mem 0.27 GB tm 23.04us/ 13.11ms ( 5.71 GFLOPS, 5.78 GB/s) | |
*** 109 r_64_16_4_4_4n1 arg 5 sz [64, 1, 1] [4, 16, 1] OPs 0M/ 1.73G mem 0.27 GB tm 20.22us/ 13.13ms ( 6.52 GFLOPS, 6.61 GB/s) | |
*** 110 r_1239_64_4_4 arg 3 sz [19.359375, 1, 1] [64, 1, 1] OPs 2M/ 1.73G mem 0.27 GB tm 104.96us/ 13.24ms ( 24.18 GFLOPS, 24.37 GB/s) | |
*** 111 r_16_16_4_8_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 13.06us/ 13.25ms ( 5.03 GFLOPS, 5.13 GB/s) | |
*** 112 r_16_16_4_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.96us/ 13.26ms ( 0.93 GFLOPS, 0.97 GB/s) | |
*** 113 r_16_16_4_4n1 arg 5 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.70us/ 13.27ms ( 0.96 GFLOPS, 1.01 GB/s) | |
*** 114 r_132_4_16_4 arg 3 sz [4, 132, 1] [16, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 23.04us/ 13.29ms ( 2.93 GFLOPS, 3.03 GB/s) | |
*** 115 r_4_16_4_8_4 arg 4 sz [4, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 13.06us/ 13.31ms ( 1.26 GFLOPS, 1.34 GB/s) | |
*** 116 r_4_4_4_4 arg 4 sz [1, 1, 1] [4, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 7.17us/ 13.31ms ( 0.08 GFLOPS, 0.09 GB/s) | |
*** 117 r_4_4_4_4n1 arg 5 sz [2, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 6.91us/ 13.32ms ( 0.08 GFLOPS, 0.10 GB/s) | |
*** 118 r_2_4_4_4 arg 3 sz [4, 1, 1] [1, 2, 1] OPs 0M/ 1.74G mem 0.27 GB tm 5.12us/ 13.33ms ( 0.05 GFLOPS, 0.06 GB/s) | |
*** 119 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 15.10us/ 13.34ms ( 2.17 GFLOPS, 2.25 GB/s) | |
*** 120 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.19us/ 13.35ms ( 0.26 GFLOPS, 0.28 GB/s) | |
*** 121 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 7.17us/ 13.36ms ( 0.30 GFLOPS, 0.33 GB/s) | |
*** 122 r_66_4_4_8 arg 3 sz [4, 2.0625, 1] [1, 32, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.19us/ 13.36ms ( 2.06 GFLOPS, 2.20 GB/s) | |
*** 123 r_16_16_4_8_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 14.08us/ 13.38ms ( 4.66 GFLOPS, 4.75 GB/s) | |
*** 124 r_16_16_4_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 9.73us/ 13.39ms ( 0.86 GFLOPS, 0.89 GB/s) | |
*** 125 r_16_16_4_4n1 arg 5 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.70us/ 13.40ms ( 0.96 GFLOPS, 1.01 GB/s) | |
*** 126 r_26_4_16_4 arg 3 sz [4, 26, 1] [16, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 11.78us/ 13.41ms ( 1.13 GFLOPS, 1.18 GB/s) | |
*** 127 r_4_16_4_8_4 arg 4 sz [4, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 12.03us/ 13.42ms ( 1.36 GFLOPS, 1.45 GB/s) | |
*** 128 r_4_4_4_4 arg 4 sz [1, 1, 1] [4, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 8.96us/ 13.43ms ( 0.06 GFLOPS, 0.07 GB/s) | |
*** 129 r_4_4_4_4n1 arg 5 sz [2, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 10.24us/ 13.44ms ( 0.05 GFLOPS, 0.07 GB/s) | |
*** 130 r_4_4_4 arg 3 sz [4, 1, 1] [1, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 5.12us/ 13.44ms ( 0.03 GFLOPS, 0.03 GB/s) | |
*** 131 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.74G mem 0.27 GB tm 12.03us/ 13.46ms ( 2.73 GFLOPS, 2.82 GB/s) | |
*** 132 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 11.01us/ 13.47ms ( 0.19 GFLOPS, 0.21 GB/s) | |
*** 133 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 11.01us/ 13.48ms ( 0.19 GFLOPS, 0.22 GB/s) | |
*** 134 r_2_4_4_8 arg 3 sz [2, 2, 1] [2, 1, 1] OPs 0M/ 1.74G mem 0.27 GB tm 5.89us/ 13.48ms ( 0.09 GFLOPS, 0.10 GB/s) | |
*** 135 r_128_16_4_32_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 2M/ 1.74G mem 0.27 GB tm 153.86us/ 13.64ms ( 13.64 GFLOPS, 13.68 GB/s) | |
*** 136 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.74G mem 0.27 GB tm 59.90us/ 13.70ms ( 17.54 GFLOPS, 17.62 GB/s) | |
*** 137 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.74G mem 0.27 GB tm 66.82us/ 13.76ms ( 15.72 GFLOPS, 15.79 GB/s) | |
*** 138 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.74G mem 0.27 GB tm 58.11us/ 13.82ms ( 18.08 GFLOPS, 18.17 GB/s) | |
*** 139 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.74G mem 0.27 GB tm 67.84us/ 13.89ms ( 15.48 GFLOPS, 15.55 GB/s) | |
*** 140 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.74G mem 0.27 GB tm 59.14us/ 13.95ms ( 17.77 GFLOPS, 17.85 GB/s) | |
*** 141 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.75G mem 0.27 GB tm 67.84us/ 14.02ms ( 15.48 GFLOPS, 15.55 GB/s) | |
*** 142 r_256_16_4_8_4 arg 4 sz [256, 1, 1] [4, 16, 1] OPs 1M/ 1.75G mem 0.27 GB tm 57.86us/ 14.08ms ( 18.16 GFLOPS, 18.25 GB/s) | |
*** 143 r_128_16_4_16_4 arg 5 sz [128, 1, 1] [4, 16, 1] OPs 1M/ 1.75G mem 0.27 GB tm 68.86us/ 14.14ms ( 15.25 GFLOPS, 15.32 GB/s) | |
*** 144 r_16_16_4_8_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 13.06us/ 14.16ms ( 5.03 GFLOPS, 5.13 GB/s) | |
*** 145 r_16_16_4_4 arg 4 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.17ms ( 0.76 GFLOPS, 0.79 GB/s) | |
*** 146 r_16_16_4_4n1 arg 5 sz [16, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.98us/ 14.18ms ( 0.84 GFLOPS, 0.88 GB/s) | |
*** 147 r_12_4_16_4 arg 3 sz [4, 12, 1] [16, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.98us/ 14.19ms ( 0.62 GFLOPS, 0.65 GB/s) | |
*** 148 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 12.03us/ 14.20ms ( 2.73 GFLOPS, 2.82 GB/s) | |
*** 149 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.21ms ( 0.19 GFLOPS, 0.21 GB/s) | |
*** 150 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.98us/ 14.22ms ( 0.21 GFLOPS, 0.24 GB/s) | |
*** 151 r_8_4_4_8 arg 3 sz [4, 4, 1] [1, 2, 1] OPs 0M/ 1.75G mem 0.27 GB tm 6.91us/ 14.23ms ( 0.30 GFLOPS, 0.32 GB/s) | |
*** 152 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 13.06us/ 14.24ms ( 2.51 GFLOPS, 2.60 GB/s) | |
*** 153 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.25ms ( 0.19 GFLOPS, 0.21 GB/s) | |
*** 154 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.26ms ( 0.19 GFLOPS, 0.22 GB/s) | |
*** 155 r_3_4_4_8 arg 3 sz [4, 3, 1] [1, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.22us/ 14.27ms ( 0.08 GFLOPS, 0.10 GB/s) | |
*** 156 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 12.03us/ 14.28ms ( 2.73 GFLOPS, 2.82 GB/s) | |
*** 157 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.98us/ 14.29ms ( 0.21 GFLOPS, 0.23 GB/s) | |
*** 158 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.31ms ( 0.19 GFLOPS, 0.22 GB/s) | |
*** 159 r_2_4_4_8n1 arg 3 sz [4, 1, 1] [1, 2, 1] OPs 0M/ 1.75G mem 0.27 GB tm 6.14us/ 14.31ms ( 0.08 GFLOPS, 0.10 GB/s) | |
*** 160 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.78us/ 14.32ms ( 2.79 GFLOPS, 2.89 GB/s) | |
*** 161 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.33ms ( 0.19 GFLOPS, 0.21 GB/s) | |
*** 162 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.35ms ( 0.19 GFLOPS, 0.22 GB/s) | |
*** 163 r_3_4_4_8 arg 3 sz [4, 3, 1] [1, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 9.98us/ 14.36ms ( 0.08 GFLOPS, 0.09 GB/s) | |
*** 164 r_8_16_4_8_4 arg 4 sz [8, 1, 1] [4, 16, 1] OPs 0M/ 1.75G mem 0.27 GB tm 12.80us/ 14.37ms ( 2.56 GFLOPS, 2.65 GB/s) | |
*** 165 r_8_8_4_4 arg 4 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.01us/ 14.38ms ( 0.19 GFLOPS, 0.21 GB/s) | |
*** 166 r_8_8_4_4n1 arg 5 sz [4, 1, 1] [2, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 11.26us/ 14.39ms ( 0.19 GFLOPS, 0.21 GB/s) | |
*** 167 r_3_4_4_8 arg 3 sz [4, 3, 1] [1, 1, 1] OPs 0M/ 1.75G mem 0.27 GB tm 10.24us/ 14.40ms ( 0.07 GFLOPS, 0.09 GB/s) | |
*** 168 r_10_128_512_4_4n1 arg 3 sz [8, 1.25, 1] [16, 8, 1] OPs 20M/ 1.75G mem 0.27 GB tm 256.00us/ 14.66ms ( 81.92 GFLOPS, 8.39 GB/s) | |
*** 169 r_128_4_10n1 arg 4 sz [32, 1, 1] [4, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 8.70us/ 14.67ms ( 1.76 GFLOPS, 2.82 GB/s) | |
*** 170 E_129_4 arg 4 sz [8.0625, 1, 1] [16, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 5.89us/ 14.67ms ( 0.18 GFLOPS, 0.88 GB/s) | |
*** 171 r_32_129_4_4 arg 4 sz [8, 1, 1] [4, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 49.92us/ 14.72ms ( 2.65 GFLOPS, 2.68 GB/s) | |
*** 172 r_128_16_4_2_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 23.04us/ 14.74ms ( 5.73 GFLOPS, 5.83 GB/s) | |
*** 173 r_32_16_4_8_4 arg 5 sz [32, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 19.71us/ 14.76ms ( 6.67 GFLOPS, 6.75 GB/s) | |
*** 174 r_128_16_4_2_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 22.02us/ 14.79ms ( 6.00 GFLOPS, 6.10 GB/s) | |
*** 175 r_32_16_4_8_4 arg 5 sz [32, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 16.90us/ 14.80ms ( 7.78 GFLOPS, 7.88 GB/s) | |
*** 176 r_128_16_4_2_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 22.02us/ 14.83ms ( 6.00 GFLOPS, 6.10 GB/s) | |
*** 177 r_32_16_4_8_4 arg 5 sz [32, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 18.18us/ 14.84ms ( 7.23 GFLOPS, 7.32 GB/s) | |
*** 178 r_128_16_4_2_4 arg 4 sz [128, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 22.27us/ 14.87ms ( 5.93 GFLOPS, 6.03 GB/s) | |
*** 179 r_32_16_4_8_4 arg 5 sz [32, 1, 1] [4, 16, 1] OPs 0M/ 1.77G mem 0.27 GB tm 17.15us/ 14.88ms ( 7.66 GFLOPS, 7.76 GB/s) | |
*** 180 r_4_16_2_4 arg 3 sz [4, 1, 1] [16, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 6.91us/ 14.89ms ( 0.15 GFLOPS, 0.19 GB/s) | |
*** 181 E_128_4n2 arg 4 sz [16, 1, 1] [8, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 5.89us/ 14.90ms ( 0.17 GFLOPS, 0.87 GB/s) | |
*** 182 E_1626_4 arg 31 sz [25.40625, 1, 1] [64, 1, 1] OPs 0M/ 1.77G mem 0.27 GB tm 51.97us/ 14.95ms ( 3.63 GFLOPS, 15.52 GB/s) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment