- Download the latest ATLAS source from http://sourceforge.net/projects/math-atlas/files/. I'm using version 3.10.1
- Download the latest Netlib LAPACK from http://www.netlib.org/lapack/. I'm using version 3.4.2
- Turn off frequency scalings on your chip so that you can get reliable timings. This is essential to get a good ATLAS build
sudo apt-get install cpufreq-info cpuspeed cpufrequtils sysfsutils
# set each core to the "performance" governor, so that the clock frequency doesn't go down when idle
# I have 8 cores, which is why I need to do this 8 times
sudo cpufreq-selector -c 0 -g performance
sudo cpufreq-selector -c 1 -g performance
sudo cpufreq-selector -c 2 -g performance
sudo cpufreq-selector -c 3 -g performance
sudo cpufreq-selector -c 4 -g performance
sudo cpufreq-selector -c 5 -g performance
sudo cpufreq-selector -c 6 -g performance
sudo cpufreq-selector -c 7 -g performance
# check to make sure the scaling was set correctly
sudo cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
- Compile ATLAS.
tar -xjvf atlas3.10.1.tar.bz2
cd ATLAS
mkdir build
cd build
../configure -Fa alg '-fPIC' --with-netlib-lapack-tarfile=<PATH_TO_NETLIB_LAPACK_TARBALL> --prefix=$HOME/opt/atlas --shared
make
make test
make install
export LD_LIBRARY_PATH=$HOME/opt/atlas/lib:$LD_LIBRARY_PATH
In ATLAS 3.10.1, the two shared libraries that get compiled are named libsatlas.so and libtatlas.so. As configured above, they both contain a full (C)BLAS+LAPACK interface. The differences is that the first is serial and the second is threaded. If you find that confusing and want to make linking against -latlas or -lcblas possible, then go into the install directory and install some symlinks.
cd $HOME/opt/atlas/lib
ln -s libtatlas.so libatlas.so # make "libatlas" point to the threaded library
ln -s libtatlas.so libcblas.so # make "libcblas" point to the threaded atlas
Get the numpy source distribution. Move site.cfg.example to site.cfg, and set the following entries in site.cfg
[DEFAULT]
library_dirs = <YOUR_HOME_DIRECTORY>/opt/atlas/lib
include_dirs = <YOUR_HOME_DIRECTORY>//opt/atlas/include
[blas_opt]
libraries = tatlas
[lapack_opt]
libraries = tatlas
Now run setup.py install. Compile scipy from source as well, and it will automatically use the same ATLAS to build, since it detects its build configuration using numpy.distutils.
Download the file time_dgemm.c from this gist. It does a big matrix multiply via the cblas_dgemm function. You can try linking the program against different BLAS implementations to test your speed.
# link against the default system cblas.
$ gcc time_dgemm.c -lcblas && ./a.out
10.997839 s
You can see that this is linked against the system cblas/atlas, installed throug the package manager.
$ ldd a.out
linux-vdso.so.1 => (0x00007fffbddff000)
libcblas.so.3gf => /usr/lib/libcblas.so.3gf (0x00007f5aa9498000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f5aa90d8000)
libatlas.so.3gf => /usr/lib/libatlas.so.3gf (0x00007f5aa8ba7000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f5aa8890000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f5aa8679000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f5aa845c000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f5aa8160000)
/lib64/ld-linux-x86-64.so.2 (0x00007f5aa96d1000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f5aa7f29000)
Linking the same code against our threaded blas, I get a ~10x speedup.
$ gcc time_dgemm.c -L$HOME/opt/atlas/lib -ltatlas && ./a.out
1.302789 s
The single threaded optimized version gets a ~3x speedup too.
$ gcc time_dgemm.c -L$HOME/opt/atlas/lib -lsatlas && ./a.out
3.237809 s