To benchmark Python (Numpy) default procedure to find all eigenvalues and eigenvectors of a complex (hermitian) matrix and compare the results with Fortran using either MKL and OpenBLAS libraries.
import time
import numpy as np
# record start time
start = time.time()
# square matrix dimension
n = 4096
# generate our random (real and imaginary) matrix elements
print('01. Generating random numbers for hermitian matrix... ')
A = np.random.rand(n,n)
B = np.random.rand(n,n)
print('02. Building our positive definite hermitian matrix...')
# build complex matrix from real ones
C = A + 1j*B
# ensure its hermitian
C = C * C.conj().T
# positive-definite
np.fill_diagonal(C, C.diagonal() + n)
# compute eigenvalues and eigenvectors
print('03. Computing all eigenvalues+eigenvectors using numpy eigh...')
w, v = np.linalg.eigh(C)
# save end time
end = time.time()
print(f"Numpy eigh took: {end-start:8.3f} seconds")Custom build of the latest openblas with 8 byte interger support via the following:
wget https://github.com/xianyi/OpenBLAS/releases/download/v0.3.20/OpenBLAS-0.3.20.tar.gz
tar xf OpenBLAS-0.3.20.tar.gz
cd OpenBLAS-0.3.20
export USE_THREAD=1
export DYNAMIC_ARCH=0
export NO_WARMUP=1
export BUILD_RELAPACK=0
export BINARY=64
export INTERFACE64=1
export SYMBOLSUFFIX=64_
export NPY_USE_BLAS_ILP64=1
export NPY_BLAS_ILP64_ORDER=openblas64_
export NPY_LAPACK_ILP64_ORDER=openblas64_
make -j
sudo make install SYMBOLSUFFIX=64_
Configure numpy to use our custom (and latest) openblas:
wget https://raw.githubusercontent.com/numpy/numpy/main/site.cfg.example -O ~/.numpy-site.cfg
vi ~/.numpy-site.cfg
Uncomment and edit the following lines:
[openblas64_]
libraries = openblas64_
library_dirs = /opt/OpenBLAS/lib
include_dirs = /opt/OpenBLAS/include
runtime_library_dirs = /opt/OpenBLAS/lib
Now build our numpy with the built openblas:
cd $HOME
python3 -m venv openblas-numpy.env
source ~/openblas-numpy.env/bin/activate
export NPY_USE_BLAS_ILP64=1
export NPY_BLAS_ILP64_ORDER=openblas64_
export NPY_LAPACK_ILP64_ORDER=openblas64_
pip install numpy --no-binary numpy --force-reinstall
Results:
(50.134+50.600+50.139)/3 = 50.291 seconds
Intel mkl installed from oneapi repo using the following commands:
curl -Lo- https://apt.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB | sudo gpg --dearmor -o /usr/share/keyrings/oneapi-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/oneapi-archive-keyring.gpg] https://apt.repos.intel.com/oneapi all main" | sudo tee /etc/apt/sources.list.d/oneAPI.list
sudo apt update
sudo apt install intel-oneapi-mkl intel-oneapi-mkl-devel -y
vi ~/.numpy-site.cfg
[mkl]
library_dirs = /opt/intel/oneapi/mkl/2022.1.0/lib/intel64
include_dirs = /opt/intel/oneapi/mkl/2022.1.0/include
libraries = mkl_rt
Now build numpy with mkl:
cd $HOME
python3 -m venv mkl-numpy.env
source ~/mkl-numpy.env/bin/activate
pip install numpy --no-binary numpy --force-reinstall
Check if numpy is really using mkl:
python
>>> import numpy as np
>>> np.__config__.show()
Results:
(9.304+8.974+9.050)/3 = 9.11 seconds (x5.5 faster than openblas)
⮊ NOTE 1:
For comparison, intel fortran 2021.6 (ifort) with the same mkl took 10.8 seconds. gfortran 11 with openblas 0.3.20 took 12.6 seconds. So, in this case, Python is ~20% faster than Fortran!
⮊ NOTE 2:
You can use
w, v = scipy.linalg.eigh(C, driver='evr')with numpy+openblas in order to avoid this performance issue. In this case, openblas will be only marginally slower than mkl.
⮊ NOTE 3:
Hardware used for this comparison: Intel i5 12600k with 64GB DDR4 3200MHz memory running Debian Linux testing.