See references at
https://www.khronos.org/opencl/
The laptop seems to be kicking the desktop's arse for the basic properties bandwidth and clock speeds, though there are only 24 units on the laptop, versus the graphics card's ~1000.
http://cpuboss.com/ -> vers good for basic facts including opencl throughput
pyopencl https://gist.github.com/patrickmmartin/e1313dde7b908e8d009f2a13c3cd164b
- rename the .icd files for when there are broken drivers to avoid annoyances
sudo updatedb
andlocate
are amazing
- problem with nvidia .icd ? *
$ locate .icd
*/etc/OpenCL/vendors/intel-beignet.icd*
*/etc/OpenCL/vendors/nvidia.icd*
/home/patrick/src/C/beignet/intel-beignet.icd.in
/home/patrick/src/C/beignet/build/intel-beignet.icd
$ cat `locate .icd`
*/usr/local/lib/beignet//libcl.so*
*libnvidia-opencl.so.1*
@BEIGNET_INSTALL_DIR@/libcl.so
/usr/local/lib/beignet//libcl.so
$ cat `locate .icd`| xargs -n1 ls -larth
*-rw-r--r-- 1 root root 1.8M May 23 00:29 /usr/local/lib/beignet//libcl.so*
*ls: cannot access 'libnvidia-opencl.so.1': No such file or directory*
ls: cannot access '@BEIGNET_INSTALL_DIR@/libcl.so': No such file or directory
-rw-r--r-- 1 root root 1.8M May 23 00:29 /usr/local/lib/beignet//libcl.so
$ locate libnvidia-opencl
/usr/lib/i386-linux-gnu/libnvidia-opencl.so.1
/usr/lib/i386-linux-gnu/libnvidia-opencl.so.375.39
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.375.39
Comparisons
- i7500U -> OpenCL capable
- Intel(R) HD Graphics Kabylake ULT GT2Intersect -> OpenCL capable
- Luxmark 3.1 passes all tests and registers the on-board GPU and CPU as rendering targets
- pyopencl tests NOT RUN
Luxmark 3.1 passes all tests and registers the on-board GPU and CPU as rendering targets
- Needs an opencl implementation
opencl implementation: https://cgit.freedesktop.org/beignet/tree/docs/Beignet.mdwn
-
installation was from source, but straightforward enough :P
-
Luxmark 3.1 passes only the ball tests and registers only the on-board GPU as rendering targets
-
pyopencl tests
$ python benchmark.py
Execution time of test without OpenCL: 0.025046110153198242 s
===============================================================
Platform name: Intel Gen OCL Driver
Platform profile: FULL_PROFILE
Platform vendor: Intel
Platform version: OpenCL 1.2 beignet 1.4 (git-448f8f7)
---------------------------------------------------------------
Device name: Intel(R) HD Graphics Kabylake ULT GT2
Device type: GPU
Device memory: 3932 MB
Device max clock speed: 1000 MHz
Device compute units: 24
Device max work group size: 512
Device max work item sizes: [512, 512, 512]
Data points: 8388608
Workers: 256
Preferred work group size multiple: 16
Execution time of test: 0.00440888 s
Results OK
$ python dump-performance.py
float32 add: 1828.97 GOps/s
bandwidth @ 1073741824 bytes: 7.59742 GB/s
DeviceToHostTransfer
bandwidth @ 1073741824 bytes: 9.58943 GB/s
DeviceToDeviceTransfer
bandwidth @ 1073741824 bytes: 6.81554 GB/s
- Core(TM)2 Quad CPU Q8200 @ 2.33GHz <- NOT opencl capable
- Nvida GT 730 <- opencl capable ?
opencl implementation: https://cgit.freedesktop.org/beignet/tree/docs/Beignet.mdwn
Does not appear to work? - utest_run
sudo apt-get install nvidia-375
-
lots of dependencies
-
dependencies only install with the python 2 set via
update-alternatives
_errors were seen from the X server (vnc4server) resulting from the moved beignet files (whoops) _
X server is needed for access to openCL (yes?!), so getting X server working is first step
reboot and local login now seems to work
clinfo
works
many examples work, like mandelbrot, particles
python demo_mandelbrot.py
python gl_particle_animation.py
- fixed in cleaner set up of Ubuntu 17 ? *
Unfortunately we see a lot if this - some examples don't mind - others are blowing up
X server found. dri2 connection failed!
DRM_IOCTL_I915_GEM_APERTURE failed: Invalid argument
Assuming 131072kB available aperture size.
May lead to reduced performance or incorrect rendering.
get chip id failed: -1 [22]
param: 4, val: 0
$ python benchmark.py
Execution time of test without OpenCL: 0.119845867157 s
===============================================================
Platform name: NVIDIA CUDA
Platform profile: FULL_PROFILE
Platform vendor: NVIDIA Corporation
Platform version: OpenCL 1.2 CUDA 8.0.0
---------------------------------------------------------------
Device name: GeForce GT 730
Device type: GPU
Device memory: 979 MB
Device max clock speed: 901 MHz
Device compute units: 2
Device max work group size: 1024
Device max work item sizes: [1024, 1024, 64]
Data points: 8388608
Workers: 256
Preferred work group size multiple: 32
Execution time of test: 0.00897843 s
Results OK
$ python dump-performance.py
4194304 20171356894.8 0
float32 add: 10085.7 GOps/s
HostToDeviceTransfer
latency: 3.27519e-05 s
bandwidth @ 268435456 bytes: 1.39221 GB/s
DeviceToHostTransfer
latency: 3.89906e-05 s
bandwidth @ 268435456 bytes: 1.41215 GB/s
DeviceToDeviceTransfer
latency: 3.98391e-05 s
bandwidth @ 268435456 bytes: 5.3896 GB/s
but required this patch
--- a/examples/dump-performance.py
+++ b/examples/dump-performance.py
@@ -27,7 +27,7 @@ def main():
print("latency: %g s" % perf.transfer_latency(queue, tx_type))
- for i in range(6, 31, 2):
+ for i in range(6, 30, 2):
bs = 1 << i
print("bandwidth @ %d bytes: %g GB/s" % (
bs, perf.transfer_bandwidth(queue, tx_type, bs)/1e9))
- luxmark did not work
This works, but possibly because there is a functioning X server waiting for log on