Created
November 19, 2012 00:50
-
-
Save kristianlm/4108390 to your computer and use it in GitHub Desktop.
Running ViennaCL on Amazon GPU cluster (cg1.4xlarge)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
[ec2-user@ip-10-33-4-246 grub]$ lspci | |
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02) | |
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] | |
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] | |
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 01) | |
00:02.0 VGA compatible controller: Cirrus Logic GD 5446 | |
00:03.0 3D controller: nVidia Corporation GF100 [Tesla S2050] (rev a3) | |
00:04.0 3D controller: nVidia Corporation GF100 [Tesla S2050] (rev a3) | |
00:05.0 Unassigned class [ff80]: XenSource, Inc. Xen Platform Device (rev 01) | |
[ec2-user@ip-10-33-4-246 benchmarks]$ ./blas3bench | |
---------------------------------------------- | |
Device Info | |
---------------------------------------------- | |
CL Device Vendor ID: 4318 | |
CL Device Name: Tesla M2050 | |
CL Driver Version: 304.43 | |
-------------------------------- | |
CL Device Max Compute Units: 14 | |
CL Device Max Work Group Size: 1024 | |
CL Device Global Mem Size: 2817982464 | |
CL Device Local Mem Size: 49152 | |
---------------------------------------------- | |
---------------------------------------------- | |
## Benchmark :: Dense Matrix-Matrix product | |
---------------------------------------------- | |
------------------------------- | |
# benchmarking single-precision | |
------------------------------- | |
------ Benchmark 1: Matrix-Matrix product ------ | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.006588 | |
- GFLOPs (counting multiply&add as one operation): 162.984 | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.006632 | |
- GFLOPs (counting multiply&add as one operation): 161.903 | |
------ Benchmark 2: Matrix-Matrix product using ranges ------ | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.001577 | |
- GFLOPs (counting multiply&add as one operation): 85.1095 | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.001575 | |
- GFLOPs (counting multiply&add as one operation): 85.2176 | |
------ Benchmark 3: Matrix-Matrix product using slices ------ | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.001717 | |
- GFLOPs (counting multiply&add as one operation): 78.1699 | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.001669 | |
- GFLOPs (counting multiply&add as one operation): 80.4181 | |
------------------------------- | |
# benchmarking double-precision | |
------------------------------- | |
------ Benchmark 1: Matrix-Matrix product ------ | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.014737 | |
- GFLOPs (counting multiply&add as one operation): 72.8603 | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.014698 | |
- GFLOPs (counting multiply&add as one operation): 73.0536 | |
------ Benchmark 2: Matrix-Matrix product using ranges ------ | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.002634 | |
- GFLOPs (counting multiply&add as one operation): 50.9559 | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.002553 | |
- GFLOPs (counting multiply&add as one operation): 52.5726 | |
------ Benchmark 3: Matrix-Matrix product using slices ------ | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.002977 | |
- GFLOPs (counting multiply&add as one operation): 45.0849 | |
- Device Name: Tesla M2050 | |
- Execution time on device (no setup time included): 0.002863 | |
- GFLOPs (counting multiply&add as one operation): 46.8801 | |
[ec2-user@ip-10-33-4-246 benchmarks]$ ./vectorbench | |
---------------------------------------------- | |
Device Info | |
---------------------------------------------- | |
CL Device Vendor ID: 4318 | |
CL Device Name: Tesla M2050 | |
CL Driver Version: 304.43 | |
-------------------------------- | |
CL Device Max Compute Units: 14 | |
CL Device Max Work Group Size: 1024 | |
CL Device Global Mem Size: 2817982464 | |
CL Device Local Mem Size: 49152 | |
---------------------------------------------- | |
---------------------------------------------- | |
## Benchmark :: Vector | |
---------------------------------------------- | |
------------------------------- | |
# benchmarking single-precision | |
------------------------------- | |
------- Vector inner products ---------- | |
CPU time: 0.252503 | |
CPU GFLOPS: 0.11881 | |
Result:1.58445e+08 | |
GPU time: 0.004479 | |
GPU GFLOPS: 6.69792 | |
Result: 1.58455e+08 | |
------- Vector addition ---------- | |
CPU time: 0.330705 | |
CPU GFLOPS: 0.0907153 | |
GPU time: 0.004666 | |
GPU GFLOPS: 6.42949 | |
------- Vector multiply add ---------- | |
CPU time: 0.250788 | |
CPU GFLOPS: 0.119623 | |
GPU time: 0.00466 | |
GPU GFLOPS: 6.43777 | |
------- Vector complicated expression ---------- | |
CPU time: 0.496616 | |
CPU GFLOPS: 0.181227 | |
GPU time: 0.055736 | |
GPU GFLOPS: 1.61476 | |
------------------------------- | |
# benchmarking double-precision | |
------------------------------- | |
------- Vector inner products ---------- | |
CPU time: 0.25693 | |
CPU GFLOPS: 0.116763 | |
Result:2.01213e+08 | |
GPU time: 0.007718 | |
GPU GFLOPS: 3.88702 | |
Result: 2.01213e+08 | |
------- Vector addition ---------- | |
CPU time: 0.339278 | |
CPU GFLOPS: 0.0884231 | |
GPU time: 0.008573 | |
GPU GFLOPS: 3.49936 | |
------- Vector multiply add ---------- | |
CPU time: 0.256333 | |
CPU GFLOPS: 0.117035 | |
GPU time: 0.008537 | |
GPU GFLOPS: 3.51412 | |
------- Vector complicated expression ---------- | |
CPU time: 0.501489 | |
CPU GFLOPS: 0.179466 | |
GPU time: 0.066808 | |
GPU GFLOPS: 1.34714 | |
[ec2-user@ip-10-33-4-246 sys]$ cat /proc/cpuinfo | |
processor : 0 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 0 | |
siblings : 8 | |
core id : 0 | |
cpu cores : 4 | |
apicid : 0 | |
initial apicid : 0 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5866.80 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 1 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 0 | |
siblings : 8 | |
core id : 1 | |
cpu cores : 4 | |
apicid : 2 | |
initial apicid : 2 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5866.22 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 2 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 0 | |
siblings : 8 | |
core id : 2 | |
cpu cores : 4 | |
apicid : 4 | |
initial apicid : 4 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5865.16 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 3 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 0 | |
siblings : 8 | |
core id : 3 | |
cpu cores : 4 | |
apicid : 6 | |
initial apicid : 6 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5865.10 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 4 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 1 | |
siblings : 8 | |
core id : 0 | |
cpu cores : 4 | |
apicid : 16 | |
initial apicid : 16 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5868.28 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 5 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 1 | |
siblings : 8 | |
core id : 1 | |
cpu cores : 4 | |
apicid : 18 | |
initial apicid : 18 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5846.82 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 6 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 1 | |
siblings : 8 | |
core id : 2 | |
cpu cores : 4 | |
apicid : 20 | |
initial apicid : 20 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5861.83 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 7 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 1 | |
siblings : 8 | |
core id : 3 | |
cpu cores : 4 | |
apicid : 22 | |
initial apicid : 22 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5525.50 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 8 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 0 | |
siblings : 8 | |
core id : 0 | |
cpu cores : 4 | |
apicid : 1 | |
initial apicid : 1 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5746.34 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 9 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 0 | |
siblings : 8 | |
core id : 1 | |
cpu cores : 4 | |
apicid : 3 | |
initial apicid : 3 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5866.53 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 10 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 0 | |
siblings : 8 | |
core id : 2 | |
cpu cores : 4 | |
apicid : 5 | |
initial apicid : 5 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5866.61 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 11 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 0 | |
siblings : 8 | |
core id : 3 | |
cpu cores : 4 | |
apicid : 7 | |
initial apicid : 7 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5866.53 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 12 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 1 | |
siblings : 8 | |
core id : 0 | |
cpu cores : 4 | |
apicid : 17 | |
initial apicid : 17 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5869.50 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 13 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 1 | |
siblings : 8 | |
core id : 1 | |
cpu cores : 4 | |
apicid : 19 | |
initial apicid : 19 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5866.48 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 14 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 1 | |
siblings : 8 | |
core id : 2 | |
cpu cores : 4 | |
apicid : 21 | |
initial apicid : 21 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5869.71 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
processor : 15 | |
vendor_id : GenuineIntel | |
cpu family : 6 | |
model : 26 | |
model name : Intel(R) Xeon(R) CPU X5570 @ 2.93GHz | |
stepping : 5 | |
microcode : 0x11 | |
cpu MHz : 2933.403 | |
cache size : 8192 KB | |
physical id : 1 | |
siblings : 8 | |
core id : 3 | |
cpu cores : 4 | |
apicid : 23 | |
initial apicid : 23 | |
fpu : yes | |
fpu_exception : yes | |
cpuid level : 11 | |
wp : yes | |
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat clflush mmx fxsr sse sse2 ht syscall nx lm constant_tsc rep_good nopl xtopology pni ssse3 cx16 sse4_1 sse4_2 popcnt hypervisor lahf_lm | |
bogomips : 5866.38 | |
clflush size : 64 | |
cache_alignment : 64 | |
address sizes : 40 bits physical, 48 bits virtual | |
power management: | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment