Skip to content

Instantly share code, notes, and snippets.

@patrickmmartin
Last active May 23, 2017 00:32
Show Gist options
  • Save patrickmmartin/1fcc33f2790426c35d51e7e7f849b1aa to your computer and use it in GitHub Desktop.
Save patrickmmartin/1fcc33f2790426c35d51e7e7f849b1aa to your computer and use it in GitHub Desktop.
Notes on openCL

Notes on OpenCL

Implementations

See references at

https://www.khronos.org/opencl/

Overview

The laptop seems to be kicking the desktop's arse for the basic properties bandwidth and clock speeds, though there are only 24 units on the laptop, versus the graphics card's ~1000.

Resources

http://cpuboss.com/ -> vers good for basic facts including opencl throughput

Setup

pyopencl https://gist.github.com/patrickmmartin/e1313dde7b908e8d009f2a13c3cd164b

tricks

  • rename the .icd files for when there are broken drivers to avoid annoyances
  • sudo updatedb and locate are amazing

example - clean-ish set of .icd after Nvidia install and beignet

  • problem with nvidia .icd ? *
$ locate .icd
*/etc/OpenCL/vendors/intel-beignet.icd*
*/etc/OpenCL/vendors/nvidia.icd*
/home/patrick/src/C/beignet/intel-beignet.icd.in
/home/patrick/src/C/beignet/build/intel-beignet.icd

$ cat `locate .icd`
*/usr/local/lib/beignet//libcl.so*
*libnvidia-opencl.so.1*
@BEIGNET_INSTALL_DIR@/libcl.so
/usr/local/lib/beignet//libcl.so

$ cat `locate .icd`| xargs -n1 ls -larth
*-rw-r--r-- 1 root root 1.8M May 23 00:29 /usr/local/lib/beignet//libcl.so*
*ls: cannot access 'libnvidia-opencl.so.1': No such file or directory*
ls: cannot access '@BEIGNET_INSTALL_DIR@/libcl.so': No such file or directory
-rw-r--r-- 1 root root 1.8M May 23 00:29 /usr/local/lib/beignet//libcl.so
 
 $ locate libnvidia-opencl
/usr/lib/i386-linux-gnu/libnvidia-opencl.so.1
/usr/lib/i386-linux-gnu/libnvidia-opencl.so.375.39
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.375.39

Hardware support results

Comparisons

Dell XPS 13

  • i7500U -> OpenCL capable
  • Intel(R) HD Graphics Kabylake ULT GT2Intersect -> OpenCL capable

Windows 10

opencl implementation bundled with the Windows drivers

  • Luxmark 3.1 passes all tests and registers the on-board GPU and CPU as rendering targets
  • pyopencl tests NOT RUN

Luxmark 3.1 passes all tests and registers the on-board GPU and CPU as rendering targets

Linux (Ubuntu 16.04)

  • Needs an opencl implementation
  • installation was from source, but straightforward enough :P

  • Luxmark 3.1 passes only the ball tests and registers only the on-board GPU as rendering targets

  • pyopencl tests


$ python benchmark.py 
Execution time of test without OpenCL:  0.025046110153198242 s
===============================================================
Platform name: Intel Gen OCL Driver
Platform profile: FULL_PROFILE
Platform vendor: Intel
Platform version: OpenCL 1.2 beignet 1.4 (git-448f8f7)
---------------------------------------------------------------
Device name: Intel(R) HD Graphics Kabylake ULT GT2
Device type: GPU
Device memory:  3932 MB
Device max clock speed: 1000 MHz
Device compute units: 24
Device max work group size: 512
Device max work item sizes: [512, 512, 512]
Data points: 8388608
Workers: 256
Preferred work group size multiple: 16
Execution time of test: 0.00440888 s
Results OK



$ python dump-performance.py   
float32 add: 1828.97 GOps/s  
bandwidth @ 1073741824 bytes: 7.59742 GB/s  
DeviceToHostTransfer  
bandwidth @ 1073741824 bytes: 9.58943 GB/s  
DeviceToDeviceTransfer  
bandwidth @ 1073741824 bytes: 6.81554 GB/s  

Desktop

  • Core(TM)2 Quad CPU Q8200 @ 2.33GHz <- NOT opencl capable
  • Nvida GT 730 <- opencl capable ?

Linux (Ubuntu 16.04)

Does not appear to work? - utest_run

opencl implementation: nvida-340 ?
opencl implementation: nvida-375

sudo apt-get install nvidia-375

  • lots of dependencies

  • dependencies only install with the python 2 set via update-alternatives

still no joy from clinfo -> reboot

_errors were seen from the X server (vnc4server) resulting from the moved beignet files (whoops) _

X server is needed for access to openCL (yes?!), so getting X server working is first step

reboot and local login now seems to work

clinfo works

many examples work, like mandelbrot, particles

python demo_mandelbrot.py

python gl_particle_animation.py

  • fixed in cleaner set up of Ubuntu 17 ? *

Unfortunately we see a lot if this - some examples don't mind - others are blowing up

X server found. dri2 connection failed! 
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument  
Assuming 131072kB available aperture size.  
May lead to reduced performance or incorrect rendering.  
get chip id failed: -1 [22]  
param: 4, val: 0  

$ python benchmark.py 
Execution time of test without OpenCL:  0.119845867157 s
===============================================================
Platform name: NVIDIA CUDA
Platform profile: FULL_PROFILE
Platform vendor: NVIDIA Corporation
Platform version: OpenCL 1.2 CUDA 8.0.0
---------------------------------------------------------------
Device name: GeForce GT 730
Device type: GPU
Device memory:  979 MB
Device max clock speed: 901 MHz
Device compute units: 2
Device max work group size: 1024
Device max work item sizes: [1024, 1024, 64]
Data points: 8388608
Workers: 256
Preferred work group size multiple: 32
Execution time of test: 0.00897843 s
Results OK

$ python dump-performance.py  
4194304 20171356894.8 0 
float32 add: 10085.7 GOps/s  
HostToDeviceTransfer  
latency: 3.27519e-05 s  
bandwidth @ 268435456 bytes: 1.39221 GB/s  
DeviceToHostTransfer  
latency: 3.89906e-05 s  
bandwidth @ 268435456 bytes: 1.41215 GB/s  
DeviceToDeviceTransfer  
latency: 3.98391e-05 s  
bandwidth @ 268435456 bytes: 5.3896 GB/s  

but required this patch

--- a/examples/dump-performance.py  
+++ b/examples/dump-performance.py  
@@ -27,7 +27,7 @@ def main():  
   
         print("latency: %g s" % perf.transfer_latency(queue, tx_type))  
-        for i in range(6, 31, 2):  
+        for i in range(6, 30, 2):  
             bs = 1 << i  
             print("bandwidth @ %d bytes: %g GB/s" % (  
                     bs, perf.transfer_bandwidth(queue, tx_type, bs)/1e9))  
  • luxmark did not work

logging in via ssh

This works, but possibly because there is a functioning X server waiting for log on

TODO

remove beignet, see if the pure NVidia driver will remove problem with LuxMark, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment