Skip to content

Instantly share code, notes, and snippets.

@jamesdavidson
Last active April 15, 2020 16:37
Show Gist options
  • Save jamesdavidson/fafc7cdacb5e72d4408e82f540acb392 to your computer and use it in GitHub Desktop.
Save jamesdavidson/fafc7cdacb5e72d4408e82f540acb392 to your computer and use it in GitHub Desktop.
Playing around with a Neanderthal benchmark
;; Deep Learning Base AMI (Amazon Linux 2) Version 22.0 (ami-0c9d86eac25c03fea)
;; p2.xlarge
;; $ nvidia-smi --list-gpus
;; GPU 0: Tesla K80 (UUID: GPU-3e0d6827-3d8e-5a5f-85d6-114d950f6d44)
;;
;; $ cat /etc/os-release
;; NAME="Amazon Linux"
;; VERSION="2"
;; ID="amzn"
;; ID_LIKE="centos rhel fedora"
;; VERSION_ID="2"
;; PRETTY_NAME="Amazon Linux 2"
;; ANSI_COLOR="0;33"
;; CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
;; HOME_URL="https://amazonlinux.com/"
;; g++ (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6)
;; Linux ip-10-0-1-198.ap-southeast-2.compute.internal 4.14.165-133.209.amzn2.x86_64 #1 SMP Sun Feb 9 00:21:30 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
;; $ java -version
;; openjdk version "1.8.0_242"
;; OpenJDK Runtime Environment (build 1.8.0_242-b08)
;; OpenJDK 64-Bit Server VM (build 25.242-b08, mixed mode)
;; Intel MKL l_mkl_2019.5.281
;; source ~/intel/compilers_and_libraries/linux/bin/compilervars.sh intel64
;; neanderthal at de895b702
;; four cores in /proc/cpuinfo like this:
;;
;; vendor_id : GenuineIntel
;; cpu family : 6
;; model : 79
;; model name : Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
;; stepping : 1
;; microcode : 0xb000038
;; cpu MHz : 2699.892
;; cache size : 46080 KB
(ns user
(:require [uncomplicate.commons
[core :refer [with-release let-release Releaseable release]]]
[uncomplicate.clojurecuda.core :refer [with-default default-stream synchronize! current-context]]
[uncomplicate.neanderthal
[core :refer :all]
[vect-math :refer [mul! sqrt! div!]]
[native :refer [native-float]]
[cuda :as cuda :refer [cuv with-default-engine cuge]]
[random :refer [rand-normal! rand-uniform!]]]))
(defn matrix-correlation! [a!]
(with-release [ones (entry! (vctr a! (ncols a!)) 1.0)
work (mv a! ones)]
(let [a-mean (rk! (/ -1.0 (dim ones)) work ones a!)]
(let-release [large (and (< 512 (mrows a!)) (< 8196 (ncols a!)))
sigma (sy a! (mrows a!))]
(if large
(mmt! a-mean sigma)
(mm! 1.0 a-mean (trans a-mean) (view-ge sigma)))
(sqrt! (copy! (dia sigma) work))
(with-release [sigma-x*y (rk work)]
(div! (view-ge sigma) (view-ge sigma-x*y)))
sigma))))
(defn bench-neanderthal-cpu [m n factory reps]
(with-release [x (rand-uniform! (ge factory m n))]
(time
(dotimes [i reps]
(matrix-correlation! x)))))
(defn bench-neanderthal-gpu [m n factory reps]
(with-default
(cuda/with-engine factory default-stream
(with-release [x (rand-uniform! (cuge m n))]
(synchronize!)
(time
(do
(dotimes [i reps]
(release (matrix-correlation! x)))
(synchronize!)))))))
(bench-neanderthal-cpu 1000 10000 uncomplicate.neanderthal.native/native-double 10)
; "Elapsed time: 1302.26547 msecs"
(bench-neanderthal-gpu 1000 10000 uncomplicate.neanderthal.cuda/cuda-double 10)
; "Elapsed time: 49.68471 msecs"
;; Correlation[
;; {{0.55,0.14,0.61,0.63,0.63},
;; {0.03,0.66,0.44,0.77,0.64},
;; {0.98,0.24,0.71,0.67,0.37},
;; {0.32,0.55,0.58,0.16,0.55},
;; {0.72,0.86,0.93,0.44,0.72}}] // MatrixForm
;; cupy.corrcoef(cupy.array([0.55,0.14,0.61,0.63,0.63,0.03,0.66,0.44,0.77,0.64,0.98,0.24,0.71,0.67,0.37,0.32,0.55,0.58,0.16,0.55,0.72,0.86,0.93,0.44,0.72]).reshape(5,5).transpose())
;; numpy.corrcoef(numpy.array([0.55,0.14,0.61,0.63,0.63,0.03,0.66,0.44,0.77,0.64,0.98,0.24,0.71,0.67,0.37,0.32,0.55,0.58,0.16,0.55,0.72,0.86,0.93,0.44,0.72]).reshape(5,5).transpose())
;; (def x
;; (ge uncomplicate.neanderthal.native/native-double
;; [[0.55,0.14,0.61,0.63,0.63]
;; [0.03,0.66,0.44,0.77,0.64]
;; [0.98,0.24,0.71,0.67,0.37]
;; [0.32,0.55,0.58,0.16,0.55]
;; [0.72,0.86,0.93,0.44,0.72]]))
;; (matrix-correlation! x)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment