A Survey of Simple Vector Addition Performance of the Numerical Computing Libraries available in Common Lisp
Help with the following systems is required: CL-BLAPACK, MAXIMA, GSLL, XECTO
Implementation | Quicklisp | Speed | |
---|---|---|---|
ARRAY-OPERATIONS | Native | ✓ | Medium |
AVM | Native, Cuda | ✗ | Medium |
CLEM | Native | ✓ | Slow |
CL-BLAPACK | Blas/Lapack | ✗ | - |
FEMLISP-MATLISP | Native | ✓ | Fast |
GSLL | Blas/Lapack | ✓ | - |
LISP-MATRIX | Switchable | ✗ | Slow |
LLA | Blas/Lapack | ✓ | Fast |
MAGICL | Blas/Lapack | ✓ | Fast |
MAXIMA | - | ✗ | - |
NUMERICALS | Native | ✗ | Fast |
NUMCL | Native | ✓ | Slow |
PETALISP | Native | ✓ | Slow |
XECTO | Native | ✗ | - |
CL-USER> (let* ((size 1000)
(a (aops:zeros* 'double-float (list size)))
(b (aops:zeros* 'double-float (list size))))
(declare (optimize speed)
(type (array double-float) a b))
(time (loop repeat 1000 do (aops:vectorize (a b) (+ a b)))))
Evaluation took:
0.073 seconds of real time
0.073204 seconds of total run time (0.073204 user, 0.000000 system)
100.00% CPU
161,620,326 processor cycles
56,099,616 bytes consed
NIL
CL-USER> (let* ((size 1000)
(a (aops:zeros* 'double-float (list size)))
(b (aops:zeros* 'double-float (list size)))
(c (aops:zeros* 'double-float (list size))))
(declare (optimize speed)
(type (array double-float) a b c))
(time (loop repeat 1000 do (aops:vectorize! c (a b) (+ a b)))))
Evaluation took:
0.082 seconds of real time
0.082054 seconds of total run time (0.082054 user, 0.000000 system)
100.00% CPU
181,161,538 processor cycles
48,070,656 bytes consed
NIL
Should be loadable with a simple clone.
Example picked up from https://github.com/takagi/avm/blob/master/samples/vector-add.lisp
(in-package :cl-user)
(defpackage avm.samples.vector-add
(:use :cl
:avm)
(:export :main))
(in-package :avm.samples.vector-add)
(defkernel vector-add (c a b)
(setf (aref c i) (the double (+ (aref a i) (aref b i)))))
(defun random-init (array n)
(dotimes (i n)
(setf (array-aref array i) (random 1.0d0))))
(defun verify-result (as bs cs n)
(dotimes (i n)
(let ((a (array-aref as i))
(b (array-aref bs i))
(c (array-aref cs i)))
(unless (= (+ a b) c)
(error "Verification failed: i=~A, a=~A, b=~A, c=~A" i a b c)))))
(defun main (n &optional dev-id)
(declare (optimize speed))
(with-cuda (dev-id)
(with-arrays ((a double n)
(b double n)
(c double n))
(random-init a n)
(random-init b n)
(time
(loop repeat 1000 do (vector-add c a b)))
(verify-result a b c n))))
CL-USER> (avm.samples.vector-add:main 1000)
Evaluation took:
0.043 seconds of real time
0.043048 seconds of total run time (0.043048 user, 0.000000 system)
100.00% CPU
95,034,454 processor cycles
88,944 bytes consed
NIL
- Requires: should work from quicklisp
- Author hopes it to make it efficient some day
CL-USER> (let* ((size 1000)
(a (clem:zero-matrix 1 size))
(b (clem:zero-matrix 1 size)))
(declare (optimize speed)
(type clem:matrix a b))
(time (loop repeat 1000 do (clem:mat-add a b))))
Evaluation took:
1.021 seconds of real time
1.022443 seconds of total run time (0.990447 user, 0.031996 system)
[ Run times consist of 0.073 seconds GC time, and 0.950 seconds non-GC time. ]
100.10% CPU
2,255,802,064 processor cycles
104,763,424 bytes consed
NIL
CL-USER> (let* ((size 1000)
(a (clem:zero-matrix 1 size))
(b (clem:zero-matrix 1 size)))
(declare (optimize speed)
(type clem:matrix a b))
(time (loop repeat 1000 do (clem:mat-add a b :in-place t))))
Evaluation took:
0.751 seconds of real time
0.003347 seconds of total run time (0.003347 user, 0.000000 system)
0.40% CPU
1 form interpreted
1,657,839,660 processor cycles
1,184,672 bytes consed
before it was aborted by a non-local transfer of control.
; Evaluation aborted on #<SIMPLE-ERROR "not yet supported" {10057CC153}>.
Requires: foreign-numeric-vector Examples: https://github.com/blindglobe/cl-blapack/blob/master/examples.lisp Related: https://stackoverflow.com/questions/48666508/why-are-there-no-blas-routines-for-addition-and-subtraction
Issues: How do I test the addition of two vectors??? PS: I am not familiar with CFFI
CL-USER> (let* ((size 1000)
(a (fl.matlisp:zeros size)) ; These actually are 2d matrices
(b (fl.matlisp:zeros size)))
(declare (optimize speed))
(time (fl.matlisp:m+ a b))
nil)
Evaluation took:
0.008 seconds of real time
0.008094 seconds of total run time (0.008090 user, 0.000004 system)
100.00% CPU
17,849,472 processor cycles
8,000,016 bytes consed
NIL
Untested due to lack of knowledge
Requires: plenty; but also cl-blapack and foreign-numeric-vector from above that are not in quicklisp.
LISP-MATRIX> (let* ((size 1000)
(a (lisp-matrix:make-vector size :implementation :lisp-array))
(b (lisp-matrix:make-vector size :implementation :lisp-array)))
(declare (optimize speed))
(time (loop repeat 1000 do (lisp-matrix:m+ a b))))
Evaluation took:
0.757 seconds of real time
0.756834 seconds of total run time (0.756834 user, 0.000000 system)
100.00% CPU
1,671,227,204 processor cycles
25,168,368 bytes consed
NIL
LISP-MATRIX> (let* ((size 1000)
(a (lisp-matrix:make-vector size :implementation :foreign-array))
(b (lisp-matrix:make-vector size :implementation :foreign-array)))
(declare (optimize speed))
(time (loop repeat 1000 do (lisp-matrix:m+ a b))))
Evaluation took:
0.664 seconds of real time
0.663699 seconds of total run time (0.644143 user, 0.019556 system)
100.00% CPU
1,465,432,156 processor cycles
17,261,808 bytes consed
NIL
CL-USER> (defparameter *lla-configuration*
'(:efficiency-warnings (:array-type :array-conversion)))
*LLA-CONFIGURATION*
CL-USER> (ql:quickload "lla")
To load "lla":
Load 1 ASDF system:
lla
; Loading "lla"
.........
("lla")
CL-USER> (let* ((size 1000)
(a (make-array size :element-type 'double-float))
(b (make-array size :element-type 'double-float)))
(declare (optimize speed)
(type (array double-float) a b))
(time (loop repeat 1000 do (lla:axpy! 1 a b))))
Evaluation took:
0.007 seconds of real time
0.006143 seconds of total run time (0.005861 user, 0.000282 system)
85.71% CPU
13,553,164 processor cycles
163,840 bytes consed
NIL
CL-USER> (ql:quickload '("magicl" "magicl/ext-blas"))
To load "magicl":
Load 1 ASDF system:
magicl
; Loading "magicl"
To load "magicl/ext-blas":
Load 1 ASDF system:
magicl/ext-blas
; Loading "magicl/ext-blas"
("magicl" "magicl/ext-blas")
CL-USER> (let* ((size 1000)
(a (magicl:zeros (list size)))
(b (magicl:zeros (list size))))
(declare (optimize speed)
(type magicl:vector/double-float a b))
(magicl.backends:with-backends (:blas)
(time (loop repeat 1000 do (magicl:.+ a b)))))
Evaluation took:
0.020 seconds of real time
0.019953 seconds of total run time (0.011981 user, 0.007972 system)
100.00% CPU
44,039,332 processor cycles
8,105,248 bytes consed
NIL
CL-USER> (let* ((size 1000)
(a (magicl:zeros (list size)))
(b (magicl:zeros (list size)))
(c (magicl:zeros (list size))))
(declare (optimize speed)
(type magicl:vector/double-float a b c))
(magicl.backends:with-backends (:blas)
(time (loop repeat 1000 do (magicl:.+ a b c)))))
Evaluation took:
0.012 seconds of real time
0.012168 seconds of total run time (0.012144 user, 0.000024 system)
100.00% CPU
26,859,506 processor cycles
65,536 bytes consed
NIL
Untested due to lack of knowledge
Requires: should work with a simple git clone and SBCL latest release
CL-USER> (let* ((size 1000)
(numericals:*type* 'double-float)
(a (numericals:zeros size))
(b (numericals:zeros size)))
(declare (optimize speed)
(type (array double-float) a b))
(time (loop repeat 1000 do (numericals:+ a b))))
; note: Unable to optimize NU:ZEROS without knowing type of SIZE at compile-time.
; note: Unable to optimize NU:ZEROS without knowing type of SIZE at compile-time.
Evaluation took:
0.008 seconds of real time
0.007786 seconds of total run time (0.007786 user, 0.000000 system)
100.00% CPU
17,174,838 processor cycles
8,439,488 bytes consed
NIL
CL-USER> (let* ((size 1000)
(numericals:*type* 'double-float)
(a (numericals:zeros size))
(b (numericals:zeros size))
(c (numericals:zeros size)))
(declare (optimize speed)
(type (array double-float) a b c))
(time (loop repeat 1000 do (numericals:+ a b :out c))))
; note: Unable to determine optimizability of call to NUMERICALS:+ because type of C (ARRAY
DOUBLE-FLOAT) is not exact
; note: Unable to optimize NU:ZEROS without knowing type of SIZE at compile-time.
; note: Unable to optimize NU:ZEROS without knowing type of SIZE at compile-time.
; note: Unable to optimize NU:ZEROS without knowing type of SIZE at compile-time.
Evaluation took:
0.002 seconds of real time
0.002089 seconds of total run time (0.002089 user, 0.000000 system)
100.00% CPU
4,600,666 processor cycles
524,000 bytes consed
NIL
CL-USER> (let* ((size 1000)
(a (numcl:zeros (list size) :type 'double-float))
(b (numcl:zeros (list size) :type 'double-float)))
(declare (optimize speed)
(type (array double-float) a b))
(time (loop repeat 1000 do (numcl:+ a b))))
; some style warnings ignored here
Evaluation took:
0.615 seconds of real time
0.615519 seconds of total run time (0.615415 user, 0.000104 system)
100.16% CPU
26 forms interpreted
2,482 lambdas converted
1,358,418,382 processor cycles
84,625,632 bytes consed
NIL
CL-USER> (let* ((size 1000)
(a (make-array size :element-type 'double-float))
(b (make-array size :element-type 'double-float)))
(declare (optimize speed)
(type (array double-float) a b))
(time (loop repeat 1000 do (petalisp:compute (petalisp:alpha #'+ a b)))))
Evaluation took:
0.252 seconds of real time
0.552736 seconds of total run time (0.383944 user, 0.168792 system)
219.44% CPU
556,173,034 processor cycles
17,688,944 bytes consed
NIL
I am not familiar with petalisp's concepts, but may be if there is something that allows lazy iteration,
loop
can be put inside thecompute
?