Skip to content

Instantly share code, notes, and snippets.

@rgommers
Last active October 13, 2023 18:08
Show Gist options
  • Save rgommers/e10c7cf3ebd88883458544e535d7e54c to your computer and use it in GitHub Desktop.
Save rgommers/e10c7cf3ebd88883458544e535d7e54c to your computer and use it in GitHub Desktop.
Notes on BLAS/LAPACK library details and conventions

Conda-forge library names and pkg-config output

(lp64) $ ls lib/libopenblas*
lib/libopenblas.a  lib/libopenblasp-r0.3.21.a  lib/libopenblasp-r0.3.21.so  lib/libopenblas.so  lib/libopenblas.so.0
(lp64) $ ls include/
cblas.h  f77blas.h  lapacke_config.h  lapacke.h  lapacke_mangling.h  lapacke_utils.h  lapack.h  openblas_config.h

(ilp64) $ ls lib/libopenblas*
lib/libopenblas64_.a  lib/libopenblas64_p-r0.3.21.a  lib/libopenblas64_p-r0.3.21.so  lib/libopenblas64_.so  lib/libopenblas64_.so.0
(ilp64) $ ls include/
cblas.h  f77blas.h  lapacke_config.h  lapacke.h  lapacke_mangling.h  lapacke_utils.h  lapack.h  openblas_config.h

(lp64) $ pkg-config --cflags openblas
-I/path/to/env/include
(lp64) $ pkg-config --libs openblas
-L/path/to/env/lib -lopenblas
(ilp64) $ pkg-config --cflags openblas
-I/path/to/env/include
ilp64) $ pkg-config --libs openblas
-L/path/to/envlib -lopenblas

LP64 vs ILP64 interface and symbol names

Relevant discussions:

  • For OpenBLAS, see OpenMathLib/OpenBLAS#646 for standardized agreement on shared library name and symbol suffix.
  • PRs that added support for ILP64 to numpy.distutils:
  • PR that added support for ILP64 to SciPy: scipy/scipy#11302
  • Note to self: when SciPy uses ILP64, it also still requires LP64, because not all submodules support ILP64. Not for the same extensions though.

$ objdump -t lp64/lib/libopenblas.so | grep -E "scopy*" # output cleaned up cblas_scopy scopy_

$ objdump -t ilp64/lib/libopenblas64_.so | grep -E "scopy*" cblas_scopy64_ scopy_64_

What should be implemented in Meson, and what should be left to users? Thoughts:

  1. Meson should support a keyword to select the desired interface (interface : 'ilp64'), defaulting to 'lp64' because that is what reference BLAS/LAPACK provide and what is typically expected.

  2. The dependency object returned by dependency('openblas') or similar should be query-able for what the interface is.

  3. For OpenBLAS, Meson should look for libopenblas64_.so for ILP64.

  4. For Fortran, should Meson automatically set the required compile flag -fdefault-integer-8?

    • Note: this flag is specific to gfortran and Clang; For Intel compilers it is -integer-size 64(Linux/macOS), /integer-size: 64 (Windows).
    • This is almost always the right thing to do; however for integer variables that reference an external non-BLAS/LAPACK interface and must not be changed to 64-bit, those should then be explicitly integer*4 in the user code.
  5. Users are responsible for implementing name mangling. I.e., appending 64_ to BLAS/LAPACK symbols when they are requesting ILP64, and also using portable integer types in their code if they want to be able to build with both LP64 and ILP64. This typically looks something like:

    #ifdef HAVE_CBLAS64_
    #define CBLAS_FUNC(name) name ## 64_
    #else
    #define CBLAS_FUNC(name) name
    #endif
    
    CBLAS_FUNC(cblas_dgemm)(...);
  6. Users are responsible for implementing a build option (e.g., in meson_options.txt) if they want to allow switching between LP64 and ILP64.

  7. Meson doesn't know anything about f2py; users have to instruct f2py to use 64-bit integers with --f2cmap or similar. See get_f2py_int64_options in SciPy for details.

  8. When mixing C and Fortran code, the C code typically needs mangling because Fortran expects a trailing underscore. This is up to the user to implement.

  9. TBD: numpy.distutils does a symbol prefix/suffix check and provides the result to its users, it could be helpful if Meson did this. See https://github.com/numpy/numpy/blob/6094eff9/numpy/distutils/system_info.py#L2271-L2278.

CBLAS

Initial rough notes:

  • Not all implementations provide CBLAS,
  • The header is typically named cblas.h, however MKL calls it mkl_cblas.h,
  • OpenBLAS can be built without a Fortran compiler, in that case it's CBLAS + f2c'd LAPACK,
  • It would be useful if the dependency objects that Meson returned can be queried for whether CBLAS is present or not.
  • numpy.distutils detects CBLAS and defines HAVE_CBLAS if it's found.
  • BLIS doesn't build the CBLAS interface by default. To build it, define BLIS_ENABLE_CBLAS.
  • NumPy requires CBLAS, it's not optional.

MKL

$ cd /opt/intel/oneapi/mkl/latest/lib  # from recommended Intel installer
$ ls pkgconfig/
mkl-dynamic-ilp64-iomp.pc  mkl-dynamic-lp64-iomp.pc  mkl-static-ilp64-iomp.pc  mkl-static-lp64-iomp.pc
mkl-dynamic-ilp64-seq.pc   mkl-dynamic-lp64-seq.pc   mkl-static-ilp64-seq.pc   mkl-static-lp64-seq.pc

$ pkg-config --libs mkl-dynamic-ilp64-seq
-L/opt/intel/oneapi/mkl/latest/lib/pkgconfig/../../lib/intel64 -lmkl_intel_ilp64 -lmkl_sequential -lmkl_core -lpthread -lm -ldl
$ pkg-config --cflags mkl-dynamic-ilp64-seq
-DMKL_ILP64 -I/opt/intel/oneapi/mkl/latest/lib/pkgconfig/../../include
$ ls intel64/libmkl*
intel64/libmkl_avx2.so.2                  intel64/libmkl_cdft_core.so.2    intel64/libmkl_intel_lp64.so.2       intel64/libmkl_sequential.a
intel64/libmkl_avx512.so.2                intel64/libmkl_core.a            intel64/libmkl_intel_thread.a        intel64/libmkl_sequential.so
intel64/libmkl_avx.so.2                   intel64/libmkl_core.so           intel64/libmkl_intel_thread.so       intel64/libmkl_sequential.so.2
intel64/libmkl_blacs_intelmpi_ilp64.a     intel64/libmkl_core.so.2         intel64/libmkl_intel_thread.so.2     intel64/libmkl_sycl.a
intel64/libmkl_blacs_intelmpi_ilp64.so    intel64/libmkl_def.so.2          intel64/libmkl_lapack95_ilp64.a      intel64/libmkl_sycl.so
intel64/libmkl_blacs_intelmpi_ilp64.so.2  intel64/libmkl_gf_ilp64.a        intel64/libmkl_lapack95_lp64.a       intel64/libmkl_sycl.so.2
intel64/libmkl_blacs_intelmpi_lp64.a      intel64/libmkl_gf_ilp64.so       intel64/libmkl_mc3.so.2              intel64/libmkl_tbb_thread.a
intel64/libmkl_blacs_intelmpi_lp64.so     intel64/libmkl_gf_ilp64.so.2     intel64/libmkl_mc.so.2               intel64/libmkl_tbb_thread.so
intel64/libmkl_blacs_intelmpi_lp64.so.2   intel64/libmkl_gf_lp64.a         intel64/libmkl_pgi_thread.a          intel64/libmkl_tbb_thread.so.2
intel64/libmkl_blacs_openmpi_ilp64.a      intel64/libmkl_gf_lp64.so        intel64/libmkl_pgi_thread.so         intel64/libmkl_vml_avx2.so.2
intel64/libmkl_blacs_openmpi_ilp64.so     intel64/libmkl_gf_lp64.so.2      intel64/libmkl_pgi_thread.so.2       intel64/libmkl_vml_avx512.so.2
intel64/libmkl_blacs_openmpi_ilp64.so.2   intel64/libmkl_gnu_thread.a      intel64/libmkl_rt.so                 intel64/libmkl_vml_avx.so.2
intel64/libmkl_blacs_openmpi_lp64.a       intel64/libmkl_gnu_thread.so     intel64/libmkl_rt.so.2               intel64/libmkl_vml_cmpt.so.2
intel64/libmkl_blacs_openmpi_lp64.so      intel64/libmkl_gnu_thread.so.2   intel64/libmkl_scalapack_ilp64.a     intel64/libmkl_vml_def.so.2
# intel64/libmkl_blacs_openmpi_lp64.so.2    intel64/libmkl_intel_ilp64.a     intel64/libmkl_scalapack_ilp64.so    intel64/libmkl_vml_mc2.so.2
# intel64/libmkl_blas95_ilp64.a             intel64/libmkl_intel_ilp64.so    intel64/libmkl_scalapack_ilp64.so.2  intel64/libmkl_vml_mc3.so.2
# intel64/libmkl_blas95_lp64.a              intel64/libmkl_intel_ilp64.so.2  intel64/libmkl_scalapack_lp64.a      intel64/libmkl_vml_mc.so.2
intel64/libmkl_cdft_core.a                intel64/libmkl_intel_lp64.a      intel64/libmkl_scalapack_lp64.so
intel64/libmkl_cdft_core.so               intel64/libmkl_intel_lp64.so     intel64/libmkl_scalapack_lp64.so.2
$ objdump -t intel64/libmkl_intel_ilp64.so | grep scopy  # cleaned up output:
0000000000000000         *UND*  0000000000000000              mkl_blas_scopy
0000000000323000 g     F .text  0000000000000030              cblas_scopy_64
00000000002cecf0 g     F .text  0000000000000030              cblas_scopy
000000000025aca0 g     F .text  00000000000001d0              mkl_blas__scopy
000000000025aca0 g     F .text  00000000000001d0              scopy_64_
000000000025aca0 g     F .text  00000000000001d0              scopy_64
000000000025aca0 g     F .text  00000000000001d0              scopy_
000000000025aca0 g     F .text  00000000000001d0              scopy

tl;dr: for MKL there's 8 pkg-config file, so you choose lp64/ilp64, dynamic/static, and pthreads/openmp; after that you can pick whatever symbols you like, they all exist and are aliases.

A test with SciPy & MKL:

$ # No pkg-config files for MKL in conda-forge yet, so use the Intel installer:
$ export PKG_CONFIG_PATH=/opt/intel/oneapi/mkl/latest/lib/pkgconfig/
$ meson setup build --prefix=$PWD/build-install -Dblas=mkl-dynamic-ilp64-seq -Dlapack=mkl-dynamic-ilp64-seq
$ python dev.py build
$ ldd build/scipy/linalg/_flapack.cpython-310-x86_64-linux-gnu.so
        linux-vdso.so.1 (0x00007ffe2cd14000)
        libmkl_intel_ilp64.so.2 => /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_ilp64.so.2 (0x00007f75a5000000)
        libmkl_sequential.so.2 => /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_sequential.so.2 (0x00007f75a3400000)
        libmkl_core.so.2 => /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_core.so.2 (0x00007f759f000000)
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f75a62e2000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f759ee19000)
        /usr/lib64/ld-linux-x86-64.so.2 (0x00007f75a63e8000)
        libdl.so.2 => /usr/lib/libdl.so.2 (0x00007f75a62db000)
        libpthread.so.0 => /usr/lib/libpthread.so.0 (0x00007f75a62d6000
$ # Due to RPATH stripping on install, this doesn't actually work unless we put MKL into our conda env:
$ ldd build-install/lib/python3.10/site-packages/scipy/linalg/_flapack.cpython-310-x86_64-linux-gnu.so
        linux-vdso.so.1 (0x00007ffdf6bc6000)
        libmkl_intel_ilp64.so.2 => not found
        libmkl_sequential.so.2 => not found
        libmkl_core.so.2 => not found
        libm.so.6 => /usr/lib/libm.so.6 (0x00007f7e35318000)
        libc.so.6 => /usr/lib/libc.so.6 (0x00007f7e35131000)
        /usr/lib64/ld-linux-x86-64.so.2 (0x00007f7e356e5000)

Also need to remember that MKL uses a g77 ABI (Accelerate too), while OpenBLAS and most other BLAS libraries will be using the gfortran ABI. See the use-g77-abi option in SciPy's meson_options.txt.

ArmPL

Pkg-config file names for ArmPL to use (from spack/spack#34979 (comment)):

    armpl-dynamic-ilp64-omp             armpl-Fortran-static-ilp64-omp
    armpl-dynamic-ilp64-omp.pc          armpl-Fortran-static-ilp64-omp.pc
    armpl-dynamic-ilp64-seq             armpl-Fortran-static-ilp64-seq
    armpl-dynamic-ilp64-seq.pc          armpl-Fortran-static-ilp64-seq.pc
    armpl-dynamic-lp64-omp              armpl-Fortran-static-lp64-omp
    armpl-dynamic-lp64-omp.pc           armpl-Fortran-static-lp64-omp.pc
    armpl-dynamic-lp64-seq              armpl-Fortran-static-lp64-seq
    armpl-dynamic-lp64-seq.pc           armpl-Fortran-static-lp64-seq.pc
    armpl-Fortran-dynamic-ilp64-omp     armpl-static-ilp64-omp
    armpl-Fortran-dynamic-ilp64-omp.pc  armpl-static-ilp64-omp.pc
    armpl-Fortran-dynamic-ilp64-seq     armpl-static-ilp64-seq
    armpl-Fortran-dynamic-ilp64-seq.pc  armpl-static-ilp64-seq.pc
    armpl-Fortran-dynamic-lp64-omp      armpl-static-lp64-omp
    armpl-Fortran-dynamic-lp64-omp.pc   armpl-static-lp64-omp.pc
    armpl-Fortran-dynamic-lp64-seq      armpl-static-lp64-seq
    armpl-Fortran-dynamic-lp64-seq.pc   armpl-static-lp64-seq.pc

Note that as of Jan'23, ArmPL ships the files without a .pc extension (that will hopefully be fixed), and Spack renames then so adds the .pc copies of the original files.

Apple Accelerate

In macOS >=13.3, two LP64 and one ILP64 build of vecLib are shipped. Due to compatibility, the legacy interfaces (providing LAPACK 3.2.1) will be used by default. To use the new interfaces (providing LAPACK 3.9.1), including ILP64, it is necessary to set some #defines before including Accelerate / vecLib headers:

  • -DACCELERATE_NEW_LAPACK: use the new interfaces
  • -DACCELERATE_LAPACK_ILP64: use new ILP64 interfaces (note this requires -DACCELERATE_NEW_LAPACK to be set as well)

The normal F77 symbols will remain as the legacy implementation. The newer interfaces have separate symbols with suffixes $NEWLAPACK or $NEWLAPACK$ILP64.

Example binary symbols:

  • _dgemm_: this is the legacy implementation
  • _dgemm$NEWLAPACK: this is the new implementation
  • _dgemm$NEWLAPACK$ILP64: this is the new ILP64 implementaion

If you use Accelerate / vecLib headers with the above defines, you don't need to worry about the symbol names. They'll get aliased correctly.

For headers and linker flags, check if these directories exist before using them:

  1. -I/System/Library/Frameworks/vecLib.framework/Headers, flags: ['-Wl,-framework', '-Wl,Accelerate']
  2. -I/System/Library/Frameworks/vecLib.framework/Headers, flags: ['-Wl,-framework', '-Wl,vecLib']

Note that the dylib's are no longer physically present, they're provided in the shared linker cache.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment