Notes on OpenCL

Implementations

See references at

https://www.khronos.org/opencl/

Overview

The laptop seems to be kicking the desktop's arse for the basic properties bandwidth and clock speeds, though there are only 24 units on the laptop, versus the graphics card's ~1000.

Resources

http://cpuboss.com/ -> vers good for basic facts including opencl throughput

Setup

pyopencl https://gist.github.com/patrickmmartin/e1313dde7b908e8d009f2a13c3cd164b

tricks

rename the .icd files for when there are broken drivers to avoid annoyances
sudo updatedb and locate are amazing

example - clean-ish set of .icd after Nvidia install and beignet

problem with nvidia .icd ? *

$ locate .icd
*/etc/OpenCL/vendors/intel-beignet.icd*
*/etc/OpenCL/vendors/nvidia.icd*
/home/patrick/src/C/beignet/intel-beignet.icd.in
/home/patrick/src/C/beignet/build/intel-beignet.icd

$ cat `locate .icd`
*/usr/local/lib/beignet//libcl.so*
*libnvidia-opencl.so.1*
@BEIGNET_INSTALL_DIR@/libcl.so
/usr/local/lib/beignet//libcl.so

$ cat `locate .icd`| xargs -n1 ls -larth
*-rw-r--r-- 1 root root 1.8M May 23 00:29 /usr/local/lib/beignet//libcl.so*
*ls: cannot access 'libnvidia-opencl.so.1': No such file or directory*
ls: cannot access '@BEIGNET_INSTALL_DIR@/libcl.so': No such file or directory
-rw-r--r-- 1 root root 1.8M May 23 00:29 /usr/local/lib/beignet//libcl.so
 
 $ locate libnvidia-opencl
/usr/lib/i386-linux-gnu/libnvidia-opencl.so.1
/usr/lib/i386-linux-gnu/libnvidia-opencl.so.375.39
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.375.39

Hardware support results

Comparisons

Dell XPS 13

i7500U -> OpenCL capable
Intel(R) HD Graphics Kabylake ULT GT2Intersect -> OpenCL capable

Windows 10

opencl implementation bundled with the Windows drivers

Luxmark 3.1 passes all tests and registers the on-board GPU and CPU as rendering targets
pyopencl tests NOT RUN

Luxmark 3.1 passes all tests and registers the on-board GPU and CPU as rendering targets

Linux (Ubuntu 16.04)

Needs an opencl implementation

opencl implementation: https://cgit.freedesktop.org/beignet/tree/docs/Beignet.mdwn

installation was from source, but straightforward enough :P
Luxmark 3.1 passes only the ball tests and registers only the on-board GPU as rendering targets
pyopencl tests


$ python benchmark.py 
Execution time of test without OpenCL:  0.025046110153198242 s
===============================================================
Platform name: Intel Gen OCL Driver
Platform profile: FULL_PROFILE
Platform vendor: Intel
Platform version: OpenCL 1.2 beignet 1.4 (git-448f8f7)
---------------------------------------------------------------
Device name: Intel(R) HD Graphics Kabylake ULT GT2
Device type: GPU
Device memory:  3932 MB
Device max clock speed: 1000 MHz
Device compute units: 24
Device max work group size: 512
Device max work item sizes: [512, 512, 512]
Data points: 8388608
Workers: 256
Preferred work group size multiple: 16
Execution time of test: 0.00440888 s
Results OK

$ python dump-performance.py   
float32 add: 1828.97 GOps/s  
bandwidth @ 1073741824 bytes: 7.59742 GB/s  
DeviceToHostTransfer  
bandwidth @ 1073741824 bytes: 9.58943 GB/s  
DeviceToDeviceTransfer  
bandwidth @ 1073741824 bytes: 6.81554 GB/s

Desktop

Core(TM)2 Quad CPU Q8200 @ 2.33GHz <- NOT opencl capable
Nvida GT 730 <- opencl capable ?

Linux (Ubuntu 16.04)

opencl implementation: https://cgit.freedesktop.org/beignet/tree/docs/Beignet.mdwn

Does not appear to work? - utest_run

opencl implementation: nvida-340 ?

opencl implementation: nvida-375

sudo apt-get install nvidia-375

lots of dependencies
dependencies only install with the python 2 set via update-alternatives

still no joy from clinfo -> reboot

_errors were seen from the X server (vnc4server) resulting from the moved beignet files (whoops) _

X server is needed for access to openCL (yes?!), so getting X server working is first step

reboot and local login now seems to work

clinfo works

many examples work, like mandelbrot, particles

python demo_mandelbrot.py

python gl_particle_animation.py

fixed in cleaner set up of Ubuntu 17 ? *

Unfortunately we see a lot if this - some examples don't mind - others are blowing up

X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument  
Assuming 131072kB available aperture size.  
May lead to reduced performance or incorrect rendering.  
get chip id failed: -1 [22]  
param: 4, val: 0


$ python benchmark.py 
Execution time of test without OpenCL:  0.119845867157 s
===============================================================
Platform name: NVIDIA CUDA
Platform profile: FULL_PROFILE
Platform vendor: NVIDIA Corporation
Platform version: OpenCL 1.2 CUDA 8.0.0
---------------------------------------------------------------
Device name: GeForce GT 730
Device type: GPU
Device memory:  979 MB
Device max clock speed: 901 MHz
Device compute units: 2
Device max work group size: 1024
Device max work item sizes: [1024, 1024, 64]
Data points: 8388608
Workers: 256
Preferred work group size multiple: 32
Execution time of test: 0.00897843 s
Results OK

$ python dump-performance.py  
4194304 20171356894.8 0 
float32 add: 10085.7 GOps/s  
HostToDeviceTransfer  
latency: 3.27519e-05 s  
bandwidth @ 268435456 bytes: 1.39221 GB/s  
DeviceToHostTransfer  
latency: 3.89906e-05 s  
bandwidth @ 268435456 bytes: 1.41215 GB/s  
DeviceToDeviceTransfer  
latency: 3.98391e-05 s  
bandwidth @ 268435456 bytes: 5.3896 GB/s

but required this patch

--- a/examples/dump-performance.py  
+++ b/examples/dump-performance.py  
@@ -27,7 +27,7 @@ def main():  
   
         print("latency: %g s" % perf.transfer_latency(queue, tx_type))  
-        for i in range(6, 31, 2):  
+        for i in range(6, 30, 2):  
             bs = 1 << i  
             print("bandwidth @ %d bytes: %g GB/s" % (  
                     bs, perf.transfer_bandwidth(queue, tx_type, bs)/1e9))

luxmark did not work

logging in via ssh

This works, but possibly because there is a functioning X server waiting for log on

patrickmmartin/OPENCL_NOTES.md

Notes on OpenCL

Implementations

Overview

Resources

Setup

tricks

example - clean-ish set of .icd after Nvidia install and beignet

Hardware support results

Dell XPS 13

Windows 10

opencl implementation bundled with the Windows drivers

Linux (Ubuntu 16.04)

opencl implementation: https://cgit.freedesktop.org/beignet/tree/docs/Beignet.mdwn

Desktop

Linux (Ubuntu 16.04)

opencl implementation: https://cgit.freedesktop.org/beignet/tree/docs/Beignet.mdwn

opencl implementation: nvida-340 ?

opencl implementation: nvida-375

still no joy from clinfo -> reboot

logging in via ssh

TODO

remove beignet, see if the pure NVidia driver will remove problem with LuxMark, etc.