Last active
December 21, 2015 05:18
-
-
Save aokomoriuta/6255535 to your computer and use it in GitHub Desktop.
ViennaCL http://viennacl.sourceforge.net/ のblas3ベンチマーク https://github.com/viennacl/viennacl-dev/blob/master/examples/benchmarks/blas3.cpp (密行列×密行列)をGeForce Titanで走らせてみた結果、CUDAよりOpenCLの方が速いことが分かった
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
---------------------------------------------- | |
Device Info | |
---------------------------------------------- | |
---------------------------------------------- | |
---------------------------------------------- | |
## Benchmark :: Dense Matrix-Matrix product | |
---------------------------------------------- | |
------------------------------- | |
# benchmarking single-precision | |
------------------------------- | |
------ Benchmark 1: Matrix-Matrix product ------ | |
- Execution time on device (no setup time included): 0.082591 | |
- GFLOPs (counting multiply&add as separate operations): 208.011 | |
------ Benchmark 2: Matrix-Matrix product using ranges ------ | |
- Execution time on device (no setup time included): 0.010358 | |
- GFLOPs (counting multiply&add as separate operations): 207.326 | |
------ Benchmark 3: Matrix-Matrix product using slices ------ | |
- Execution time on device (no setup time included): 0.010391 | |
- GFLOPs (counting multiply&add as separate operations): 206.668 | |
------ Benchmark 4: LU factorization ------ | |
- Execution time on device (no setup time included): 0.775203 | |
- GFLOPs (counting multiply&add as separate operations): 22.1618 | |
------------------------------- | |
# benchmarking double-precision | |
------------------------------- | |
------ Benchmark 1: Matrix-Matrix product ------ | |
- Execution time on device (no setup time included): 0.082417 | |
- GFLOPs (counting multiply&add as separate operations): 208.451 | |
------ Benchmark 2: Matrix-Matrix product using ranges ------ | |
- Execution time on device (no setup time included): 0.010409 | |
- GFLOPs (counting multiply&add as separate operations): 206.31 | |
------ Benchmark 3: Matrix-Matrix product using slices ------ | |
- Execution time on device (no setup time included): 0.012321 | |
- GFLOPs (counting multiply&add as separate operations): 174.295 | |
------ Benchmark 4: LU factorization ------ | |
- Execution time on device (no setup time included): 0.823958 | |
- GFLOPs (counting multiply&add as separate operations): 20.8504 | |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
---------------------------------------------- | |
Device Info | |
---------------------------------------------- | |
CL Device Vendor ID: 4318 | |
CL Device Name: GeForce GTX TITAN | |
CL Driver Version: 319.37 | |
-------------------------------- | |
CL Device Max Compute Units: 14 | |
CL Device Max Work Group Size: 1024 | |
CL Device Global Mem Size: 6441730048 | |
CL Device Local Mem Size: 49152 | |
---------------------------------------------- | |
---------------------------------------------- | |
## Benchmark :: Dense Matrix-Matrix product | |
---------------------------------------------- | |
------------------------------- | |
# benchmarking single-precision | |
------------------------------- | |
------ Benchmark 1: Matrix-Matrix product ------ | |
- Device Name: GeForce GTX TITAN | |
- Execution time on device (no setup time included): 0.024256 | |
- GFLOPs (counting multiply&add as separate operations): 708.273 | |
------ Benchmark 2: Matrix-Matrix product using ranges ------ | |
- Device Name: GeForce GTX TITAN | |
- Execution time on device (no setup time included): 0.003088 | |
- GFLOPs (counting multiply&add as separate operations): 695.429 | |
------ Benchmark 3: Matrix-Matrix product using slices ------ | |
- Device Name: GeForce GTX TITAN | |
- Execution time on device (no setup time included): 0.003666 | |
- GFLOPs (counting multiply&add as separate operations): 585.784 | |
------ Benchmark 4: LU factorization ------ | |
- Device Name: GeForce GTX TITAN | |
- Execution time on device (no setup time included): 0.775367 | |
- GFLOPs (counting multiply&add as separate operations): 22.1571 | |
------------------------------- | |
# benchmarking double-precision | |
------------------------------- | |
------ Benchmark 1: Matrix-Matrix product ------ | |
- Device Name: GeForce GTX TITAN | |
- Execution time on device (no setup time included): 0.045137 | |
- GFLOPs (counting multiply&add as separate operations): 380.616 | |
------ Benchmark 2: Matrix-Matrix product using ranges ------ | |
- Device Name: GeForce GTX TITAN | |
- Execution time on device (no setup time included): 0.005956 | |
- GFLOPs (counting multiply&add as separate operations): 360.558 | |
------ Benchmark 3: Matrix-Matrix product using slices ------ | |
- Device Name: GeForce GTX TITAN | |
- Execution time on device (no setup time included): 0.007147 | |
- GFLOPs (counting multiply&add as separate operations): 300.473 | |
------ Benchmark 4: LU factorization ------ | |
- Device Name: GeForce GTX TITAN | |
- Execution time on device (no setup time included): 0.825527 | |
- GFLOPs (counting multiply&add as separate operations): 20.8108 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment