aokomoriuta · December 21, 2015 05:18
diff --git a/cuda.log b/cuda.log
 ----------------------------------------------
               Device Info
 ----------------------------------------------

 ----------------------------------------------
 ----------------------------------------------
 ## Benchmark :: Dense Matrix-Matrix product 
 ----------------------------------------------

   -------------------------------
   # benchmarking single-precision
   -------------------------------
 ------ Benchmark 1: Matrix-Matrix product ------ 
 - Execution time on device (no setup time included): 0.082591
 - GFLOPs (counting multiply&add as separate operations): 208.011

 ------ Benchmark 2: Matrix-Matrix product using ranges ------ 
 - Execution time on device (no setup time included): 0.010358
 - GFLOPs (counting multiply&add as separate operations): 207.326

 ------ Benchmark 3: Matrix-Matrix product using slices ------ 
 - Execution time on device (no setup time included): 0.010391
 - GFLOPs (counting multiply&add as separate operations): 206.668

 ------ Benchmark 4: LU factorization ------ 
 - Execution time on device (no setup time included): 0.775203
 - GFLOPs (counting multiply&add as separate operations): 22.1618


   -------------------------------
   # benchmarking double-precision
   -------------------------------
 ------ Benchmark 1: Matrix-Matrix product ------ 
 - Execution time on device (no setup time included): 0.082417
 - GFLOPs (counting multiply&add as separate operations): 208.451

 ------ Benchmark 2: Matrix-Matrix product using ranges ------ 
 - Execution time on device (no setup time included): 0.010409
 - GFLOPs (counting multiply&add as separate operations): 206.31

 ------ Benchmark 3: Matrix-Matrix product using slices ------ 
 - Execution time on device (no setup time included): 0.012321
 - GFLOPs (counting multiply&add as separate operations): 174.295

 ------ Benchmark 4: LU factorization ------ 
 - Execution time on device (no setup time included): 0.823958
 - GFLOPs (counting multiply&add as separate operations): 20.8504

diff --git a/opencl.log b/opencl.log
 ----------------------------------------------
               Device Info
 ----------------------------------------------
 CL Device Vendor ID: 4318
 CL Device Name: GeForce GTX TITAN
 CL Driver Version: 319.37
 --------------------------------
 CL Device Max Compute Units: 14
 CL Device Max Work Group Size: 1024
 CL Device Global Mem Size: 6441730048
 CL Device Local Mem Size: 49152


 ----------------------------------------------
 ----------------------------------------------
 ## Benchmark :: Dense Matrix-Matrix product 
 ----------------------------------------------

   -------------------------------
   # benchmarking single-precision
   -------------------------------
 ------ Benchmark 1: Matrix-Matrix product ------ 
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.024256
 - GFLOPs (counting multiply&add as separate operations): 708.273

 ------ Benchmark 2: Matrix-Matrix product using ranges ------ 
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.003088
 - GFLOPs (counting multiply&add as separate operations): 695.429

 ------ Benchmark 3: Matrix-Matrix product using slices ------ 
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.003666
 - GFLOPs (counting multiply&add as separate operations): 585.784

 ------ Benchmark 4: LU factorization ------ 
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.775367
 - GFLOPs (counting multiply&add as separate operations): 22.1571


   -------------------------------
   # benchmarking double-precision
   -------------------------------
 ------ Benchmark 1: Matrix-Matrix product ------ 
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.045137
 - GFLOPs (counting multiply&add as separate operations): 380.616

 ------ Benchmark 2: Matrix-Matrix product using ranges ------ 
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.005956
 - GFLOPs (counting multiply&add as separate operations): 360.558

 ------ Benchmark 3: Matrix-Matrix product using slices ------ 
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.007147
 - GFLOPs (counting multiply&add as separate operations): 300.473

 ------ Benchmark 4: LU factorization ------ 
 - Device Name: GeForce GTX TITAN
 - Execution time on device (no setup time included): 0.825527
 - GFLOPs (counting multiply&add as separate operations): 20.8108
	----------------------------------------------
	Device Info
	----------------------------------------------

	----------------------------------------------
	----------------------------------------------
	## Benchmark :: Dense Matrix-Matrix product
	----------------------------------------------

	-------------------------------
	# benchmarking single-precision
	-------------------------------
	------ Benchmark 1: Matrix-Matrix product ------
	- Execution time on device (no setup time included): 0.082591
	- GFLOPs (counting multiply&add as separate operations): 208.011

	------ Benchmark 2: Matrix-Matrix product using ranges ------
	- Execution time on device (no setup time included): 0.010358
	- GFLOPs (counting multiply&add as separate operations): 207.326

	------ Benchmark 3: Matrix-Matrix product using slices ------
	- Execution time on device (no setup time included): 0.010391
	- GFLOPs (counting multiply&add as separate operations): 206.668

	------ Benchmark 4: LU factorization ------
	- Execution time on device (no setup time included): 0.775203
	- GFLOPs (counting multiply&add as separate operations): 22.1618


	-------------------------------
	# benchmarking double-precision
	-------------------------------
	------ Benchmark 1: Matrix-Matrix product ------
	- Execution time on device (no setup time included): 0.082417
	- GFLOPs (counting multiply&add as separate operations): 208.451

	------ Benchmark 2: Matrix-Matrix product using ranges ------
	- Execution time on device (no setup time included): 0.010409
	- GFLOPs (counting multiply&add as separate operations): 206.31

	------ Benchmark 3: Matrix-Matrix product using slices ------
	- Execution time on device (no setup time included): 0.012321
	- GFLOPs (counting multiply&add as separate operations): 174.295

	------ Benchmark 4: LU factorization ------
	- Execution time on device (no setup time included): 0.823958
	- GFLOPs (counting multiply&add as separate operations): 20.8504