A Survey of Simple Vector Addition Performance of the Numerical Computing Libraries available in Common Lisp

Help with the following systems is required: CL-BLAPACK, MAXIMA, GSLL, XECTO

	Implementation	Quicklisp	Speed
ARRAY-OPERATIONS	Native	✓	Medium
AVM	Native, Cuda	✗	Medium
CLEM	Native	✓	Slow
CL-BLAPACK	Blas/Lapack	✗	-
FEMLISP-MATLISP	Native	✓	Fast
GSLL	Blas/Lapack	✓	-
LISP-MATRIX	Switchable	✗	Slow
LLA	Blas/Lapack	✓	Fast
MAGICL	Blas/Lapack	✓	Fast
MAXIMA	-	✗	-
NUMERICALS	Native	✗	Fast
NUMCL	Native	✓	Slow
PETALISP	Native	✓	Slow
XECTO	Native	✗	-

ARRAY-OPERATIONS

CL-USER> (let* ((size 1000)
                (a (aops:zeros* 'double-float (list size)))
                (b (aops:zeros* 'double-float (list size))))
           (declare (optimize speed)
                    (type (array double-float) a b))
           (time (loop repeat 1000 do (aops:vectorize (a b) (+ a b)))))
Evaluation took:
  0.073 seconds of real time
  0.073204 seconds of total run time (0.073204 user, 0.000000 system)
  100.00% CPU
  161,620,326 processor cycles
  56,099,616 bytes consed
  
NIL
CL-USER> (let* ((size 1000)
                (a (aops:zeros* 'double-float (list size)))
                (b (aops:zeros* 'double-float (list size)))
                (c (aops:zeros* 'double-float (list size))))
           (declare (optimize speed)
                    (type (array double-float) a b c))
           (time (loop repeat 1000 do (aops:vectorize! c (a b) (+ a b)))))
Evaluation took:
  0.082 seconds of real time
  0.082054 seconds of total run time (0.082054 user, 0.000000 system)
  100.00% CPU
  181,161,538 processor cycles
  48,070,656 bytes consed
  
NIL

AVM

Should be loadable with a simple clone.

Example picked up from https://github.com/takagi/avm/blob/master/samples/vector-add.lisp

(in-package :cl-user)
(defpackage avm.samples.vector-add
  (:use :cl
        :avm)
  (:export :main))
(in-package :avm.samples.vector-add)

(defkernel vector-add (c a b)
  (setf (aref c i) (the double (+ (aref a i) (aref b i)))))

(defun random-init (array n)
  (dotimes (i n)
    (setf (array-aref array i) (random 1.0d0))))

(defun verify-result (as bs cs n)
  (dotimes (i n)
    (let ((a (array-aref as i))
          (b (array-aref bs i))
          (c (array-aref cs i)))
      (unless (= (+ a b) c)
        (error "Verification failed: i=~A, a=~A, b=~A, c=~A" i a b c)))))

(defun main (n &optional dev-id)
  (declare (optimize speed))
  (with-cuda (dev-id)
    (with-arrays ((a double n)
                  (b double n)
                  (c double n))
      (random-init a n)
      (random-init b n)
      (time
       (loop repeat 1000 do (vector-add c a b)))
      (verify-result a b c n))))

CL-USER> (avm.samples.vector-add:main 1000)
Evaluation took:
  0.043 seconds of real time
  0.043048 seconds of total run time (0.043048 user, 0.000000 system)
  100.00% CPU
  95,034,454 processor cycles
  88,944 bytes consed
  
NIL

CLEM

Requires: should work from quicklisp
Author hopes it to make it efficient some day

CL-USER> (let* ((size 1000)
                (a (clem:zero-matrix 1 size))
                (b (clem:zero-matrix 1 size)))
           (declare (optimize speed)
                    (type clem:matrix a b))
           (time (loop repeat 1000 do (clem:mat-add a b))))
Evaluation took:
  1.021 seconds of real time
  1.022443 seconds of total run time (0.990447 user, 0.031996 system)
  [ Run times consist of 0.073 seconds GC time, and 0.950 seconds non-GC time. ]
  100.10% CPU
  2,255,802,064 processor cycles
  104,763,424 bytes consed
  
NIL
CL-USER> (let* ((size 1000)
                (a (clem:zero-matrix 1 size))
                (b (clem:zero-matrix 1 size)))
           (declare (optimize speed)
                    (type clem:matrix a b))
           (time (loop repeat 1000 do (clem:mat-add a b :in-place t))))
Evaluation took:
  0.751 seconds of real time
  0.003347 seconds of total run time (0.003347 user, 0.000000 system)
  0.40% CPU
  1 form interpreted
  1,657,839,660 processor cycles
  1,184,672 bytes consed
  
  before it was aborted by a non-local transfer of control.
  
; Evaluation aborted on #<SIMPLE-ERROR "not yet supported" {10057CC153}>.

CL-BLAPACK

Requires: foreign-numeric-vector Examples: https://github.com/blindglobe/cl-blapack/blob/master/examples.lisp Related: https://stackoverflow.com/questions/48666508/why-are-there-no-blas-routines-for-addition-and-subtraction

Issues: How do I test the addition of two vectors??? PS: I am not familiar with CFFI

FEMLISP-MATLISP

CL-USER> (let* ((size 1000)
                (a (fl.matlisp:zeros size)) ; These actually are 2d matrices
                (b (fl.matlisp:zeros size)))
           (declare (optimize speed))
           (time (fl.matlisp:m+ a b))
           nil)
Evaluation took:
  0.008 seconds of real time
  0.008094 seconds of total run time (0.008090 user, 0.000004 system)
  100.00% CPU
  17,849,472 processor cycles
  8,000,016 bytes consed
  
NIL

GSLL

Untested due to lack of knowledge

LISP-MATRIX

Requires: plenty; but also cl-blapack and foreign-numeric-vector from above that are not in quicklisp.

LISP-MATRIX> (let* ((size 1000)
                (a (lisp-matrix:make-vector size :implementation :lisp-array))
                (b (lisp-matrix:make-vector size :implementation :lisp-array)))
               (declare (optimize speed))
               (time (loop repeat 1000 do (lisp-matrix:m+ a b))))
Evaluation took:
  0.757 seconds of real time
  0.756834 seconds of total run time (0.756834 user, 0.000000 system)
  100.00% CPU
  1,671,227,204 processor cycles
  25,168,368 bytes consed
  
NIL
LISP-MATRIX> (let* ((size 1000)
                (a (lisp-matrix:make-vector size :implementation :foreign-array))
                (b (lisp-matrix:make-vector size :implementation :foreign-array)))
               (declare (optimize speed))
               (time (loop repeat 1000 do (lisp-matrix:m+ a b))))
Evaluation took:
  0.664 seconds of real time
  0.663699 seconds of total run time (0.644143 user, 0.019556 system)
  100.00% CPU
  1,465,432,156 processor cycles
  17,261,808 bytes consed
  
NIL

LLA

CL-USER> (defparameter *lla-configuration*
           '(:efficiency-warnings (:array-type :array-conversion)))
*LLA-CONFIGURATION*
CL-USER> (ql:quickload "lla")
To load "lla":
  Load 1 ASDF system:
    lla
; Loading "lla"
.........
("lla")
CL-USER> (let* ((size 1000)
                (a (make-array size :element-type 'double-float))
                (b (make-array size :element-type 'double-float)))
           (declare (optimize speed)
                    (type (array double-float) a b))
           (time (loop repeat 1000 do (lla:axpy! 1 a b))))
Evaluation took:
  0.007 seconds of real time
  0.006143 seconds of total run time (0.005861 user, 0.000282 system)
  85.71% CPU
  13,553,164 processor cycles
  163,840 bytes consed
  
NIL

MAGICL

CL-USER> (ql:quickload '("magicl" "magicl/ext-blas"))
To load "magicl":
  Load 1 ASDF system:
    magicl
; Loading "magicl"

To load "magicl/ext-blas":
  Load 1 ASDF system:
    magicl/ext-blas
; Loading "magicl/ext-blas"

("magicl" "magicl/ext-blas")
CL-USER> (let* ((size 1000)
                (a (magicl:zeros (list size)))
                (b (magicl:zeros (list size))))
           (declare (optimize speed)
                    (type magicl:vector/double-float a b))
           (magicl.backends:with-backends (:blas)
             (time (loop repeat 1000 do (magicl:.+ a b)))))
Evaluation took:
  0.020 seconds of real time
  0.019953 seconds of total run time (0.011981 user, 0.007972 system)
  100.00% CPU
  44,039,332 processor cycles
  8,105,248 bytes consed

NIL
CL-USER> (let* ((size 1000)
                (a (magicl:zeros (list size)))
                (b (magicl:zeros (list size)))
                (c (magicl:zeros (list size))))
           (declare (optimize speed)
                    (type magicl:vector/double-float a b c))
           (magicl.backends:with-backends (:blas)
             (time (loop repeat 1000 do (magicl:.+ a b c)))))
Evaluation took:
  0.012 seconds of real time
  0.012168 seconds of total run time (0.012144 user, 0.000024 system)
  100.00% CPU
  26,859,506 processor cycles
  65,536 bytes consed

NIL

MAXIMA

Untested due to lack of knowledge

NUMERICALS

Requires: should work with a simple git clone and SBCL latest release

CL-USER> (let* ((size 1000)
                (numericals:*type* 'double-float)
                (a (numericals:zeros size))
                (b (numericals:zeros size)))
           (declare (optimize speed)
                    (type (array double-float) a b))
           (time (loop repeat 1000 do (numericals:+ a b))))

; note: Unable to optimize NU:ZEROS without knowing type of SIZE at compile-time.
; note: Unable to optimize NU:ZEROS without knowing type of SIZE at compile-time.
Evaluation took:
  0.008 seconds of real time
  0.007786 seconds of total run time (0.007786 user, 0.000000 system)
  100.00% CPU
  17,174,838 processor cycles
  8,439,488 bytes consed
  
NIL
CL-USER> (let* ((size 1000)
                (numericals:*type* 'double-float)
                (a (numericals:zeros size))
                (b (numericals:zeros size))
                (c (numericals:zeros size)))
           (declare (optimize speed)
                    (type (array double-float) a b c))
           (time (loop repeat 1000 do (numericals:+ a b :out c))))
; note: Unable to determine optimizability of call to NUMERICALS:+ because type of C (ARRAY
                                                                                      DOUBLE-FLOAT) is not exact
; note: Unable to optimize NU:ZEROS without knowing type of SIZE at compile-time.
; note: Unable to optimize NU:ZEROS without knowing type of SIZE at compile-time.
; note: Unable to optimize NU:ZEROS without knowing type of SIZE at compile-time.
Evaluation took:
  0.002 seconds of real time
  0.002089 seconds of total run time (0.002089 user, 0.000000 system)
  100.00% CPU
  4,600,666 processor cycles
  524,000 bytes consed
  
NIL

NUMCL

CL-USER> (let* ((size 1000)
                (a (numcl:zeros (list size) :type 'double-float))
                (b (numcl:zeros (list size) :type 'double-float)))
           (declare (optimize speed)
                    (type (array double-float) a b))
           (time (loop repeat 1000 do (numcl:+ a b))))
; some style warnings ignored here
Evaluation took:
  0.615 seconds of real time
  0.615519 seconds of total run time (0.615415 user, 0.000104 system)
  100.16% CPU
  26 forms interpreted
  2,482 lambdas converted
  1,358,418,382 processor cycles
  84,625,632 bytes consed
  
NIL

PETALISP

CL-USER> (let* ((size 1000)
                (a (make-array size :element-type 'double-float))
                (b (make-array size :element-type 'double-float)))
           (declare (optimize speed)
                    (type (array double-float) a b))
           (time (loop repeat 1000 do (petalisp:compute (petalisp:alpha #'+ a b)))))
Evaluation took:
  0.252 seconds of real time
  0.552736 seconds of total run time (0.383944 user, 0.168792 system)
  219.44% CPU
  556,173,034 processor cycles
  17,688,944 bytes consed
  
NIL

digikar99/cl-numericals-survey.org