- Download the latest ATLAS source from http://sourceforge.net/projects/math-atlas/files/. I'm using version 3.10.1
- Download the latest Netlib LAPACK from http://www.netlib.org/lapack/. I'm using version 3.4.2
- Turn off frequency scalings on your chip so that you can get reliable timings. This is essential to get a good ATLAS build
sudo apt-get install cpufreq-info cpuspeed cpufrequtils sysfsutils
# set each core to the "performance" governor, so that the clock frequency doesn't go down when idle
# I have 8 cores, which is why I need to do this 8 times
sudo cpufreq-selector -c 0 -g performance
sudo cpufreq-selector -c 1 -g performance
sudo cpufreq-selector -c 2 -g performance
sudo cpufreq-selector -c 3 -g performance
sudo cpufreq-selector -c 4 -g performance
sudo cpufreq-selector -c 5 -g performance
sudo cpufreq-selector -c 6 -g performance
sudo cpufreq-selector -c 7 -g performance
# check to make sure the scaling was set correctly
sudo cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
- Compile ATLAS.
tar -xjvf atlas3.10.1.tar.bz2
cd ATLAS
mkdir build
cd build
../configure -Fa alg '-fPIC' --with-netlib-lapack-tarfile=<PATH_TO_NETLIB_LAPACK_TARBALL> --prefix=$HOME/opt/atlas --shared
make
make test
make install
export LD_LIBRARY_PATH=$HOME/opt/atlas/lib:$LD_LIBRARY_PATH
In ATLAS 3.10.1, the two shared libraries that get compiled are named libsatlas.so
and libtatlas.so
. As configured above, they both contain a full (C)BLAS+LAPACK interface. The differences is that the first is serial and the second is threaded. If you find that confusing and want to make linking against -latlas
or -lcblas
possible, then go into the install directory and install some symlinks.
cd $HOME/opt/atlas/lib
ln -s libtatlas.so libatlas.so # make "libatlas" point to the threaded library
ln -s libtatlas.so libcblas.so # make "libcblas" point to the threaded atlas
Get the numpy source distribution. Move site.cfg.example
to site.cfg
, and set the following entries in site.cfg
[DEFAULT]
library_dirs = <YOUR_HOME_DIRECTORY>/opt/atlas/lib
include_dirs = <YOUR_HOME_DIRECTORY>//opt/atlas/include
[blas_opt]
libraries = tatlas
[lapack_opt]
libraries = tatlas
Now run setup.py install
. Compile scipy from source as well, and it will automatically use the same ATLAS to build, since it detects its build configuration using numpy.distutils
.
Download the file time_dgemm.c
from this gist. It does a big matrix multiply via the cblas_dgemm
function. You can try linking the program against different BLAS implementations to test your speed.
# link against the default system cblas.
$ gcc time_dgemm.c -lcblas && ./a.out
10.997839 s
You can see that this is linked against the system cblas/atlas, installed throug the package manager.
$ ldd a.out
linux-vdso.so.1 => (0x00007fffbddff000)
libcblas.so.3gf => /usr/lib/libcblas.so.3gf (0x00007f5aa9498000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5aa90d8000)
libatlas.so.3gf => /usr/lib/libatlas.so.3gf (0x00007f5aa8ba7000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f5aa8890000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5aa8679000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5aa845c000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5aa8160000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5aa96d1000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f5aa7f29000)
Linking the same code against our threaded blas, I get a ~10x speedup.
$ gcc time_dgemm.c -L$HOME/opt/atlas/lib -ltatlas && ./a.out
1.302789 s
The single threaded optimized version gets a ~3x speedup too.
$ gcc time_dgemm.c -L$HOME/opt/atlas/lib -lsatlas && ./a.out
3.237809 s