MarkDana/m1-max-numpy-setup.md

Last active January 21, 2025 16:07

Star (32) You must be signed in to star a gist
Fork (6) You must be signed in to fork a gist

Learn more about clone URLs
Clone this repository at <script src="https://gist.github.com/MarkDana/a9481b8134cf38a556cf23e1e815dafb.js"></script>
Save MarkDana/a9481b8134cf38a556cf23e1e815dafb to your computer and use it in GitHub Desktop.

Install NumPy on M1 Max

Raw

How to install numpy on M1 Max, with the most accelerated performance (Apple's vecLib)? Here's the answer as of Dec 6 2021.

Steps

I. Install miniforge

So that your Python is run natively on arm64, not translated via Rosseta.

Download Miniforge3-MacOSX-arm64.sh, then
Run the script, then open another shell

$ bash Miniforge3-MacOSX-arm64.sh

Create an environment (here I use name np_veclib)

$ conda create -n np_veclib python=3.9
$ conda activate np_veclib

II. Install Numpy with BLAS interface specified as vecLib

To compile numpy, first need to install cython and pybind11:

$ conda install cython pybind11

Compile numpy by (Thanks @Marijn's answer) - don't use conda install!

$ pip install --no-binary :all: --no-use-pep517 numpy

An alternative of 2. is to build from source

$ git clone https://github.com/numpy/numpy
$ cd numpy
$ cp site.cfg.example site.cfg
$ nano site.cfg

Edit the copied site.cfg: add the following lines:

[accelerate]
libraries = Accelerate, vecLib

Then build and install:

$ NPY_LAPACK_ORDER=accelerate python setup.py build
$ python setup.py install

After either 2 or 3, now test whether numpy is using vecLib:

>>> import numpy
>>> numpy.show_config()

Then, info like /System/Library/Frameworks/vecLib.framework/Headers should be printed.

III. For further installing other packages using conda

Make conda recognize packages installed by pip

conda config --set pip_interop_enabled true

This must be done, otherwise if e.g. conda install pandas, then numpy will be in The following packages will be installed list and installed again. But the new installed one is from conda-forge channel and is slow.

Comparisons to other installations:

1. Competitors:

Except for the above optimal one, I also tried several other installations

A. np_default: conda create -n np_default python=3.9 numpy
B. np_openblas: conda create -n np_openblas python=3.9 numpy blas=*=*openblas*
C. np_netlib: conda create -n np_netlib python=3.9 numpy blas=*=*netlib*

The above ABC options are directly installed from conda-forge channel. numpy.show_config() will show identical results. To see the difference, examine by conda list - e.g. openblas packages are installed in B. Note that mkl or blis is not supported on arm64.

D. np_openblas_source: First install openblas by brew install openblas. Then add [openblas] path /opt/homebrew/opt/openblas to site.cfg and build Numpy from source.
M1 and i9–9880H in this post.
My old i5-6360U 2cores on MacBook Pro 2016 13in.

2. Benchmarks:

Here I use two benchmarks:

mysvd.py: My SVD decomposition

import time
import numpy as np
np.random.seed(42)
a = np.random.uniform(size=(300, 300))
runtimes = 10

timecosts = []
for _ in range(runtimes):
    s_time = time.time()
    for i in range(100):
        a += 1
        np.linalg.svd(a)
    timecosts.append(time.time() - s_time)

print(f'mean of {runtimes} runs: {np.mean(timecosts):.5f}s')

dario.py: A benchmark script by Dario Radečić at the post above.

3. Results:

+-------+-----------+------------+-------------+-----------+--------------------+----+----------+----------+
|  sec  | np_veclib | np_default | np_openblas | np_netlib | np_openblas_source | M1 | i9–9880H | i5-6360U |
+-------+-----------+------------+-------------+-----------+--------------------+----+----------+----------+
| mysvd |  1.02300  |   4.29386  |   4.13854   |  4.75812  |      12.57879      |  / |     /    |  2.39917 |
+-------+-----------+------------+-------------+-----------+--------------------+----+----------+----------+
| dario |     21    |     41     |      39     |    323    |         40         | 33 |    23    |    78    |
+-------+-----------+------------+-------------+-----------+--------------------+----+----------+----------+

d13g0 commented Sep 22, 2022 •

edited

Loading

I tried to get numpy to link to Accelerate following the steps here but when I checked with np.show_config it was showing me open blas!

The only difference is that I am using miniconda instead of miniforge (which at this date should be OK ??)

Just as a curiosity I managed to compile numpy against a copy of openblasdownloaded with Homebrew and compared the performance of the default install (the simply conda install numpy) to the alternative (Numpy source code + brewed openblas)

using this benchmark

I got these numbers:

# Option 1: with the 'included' openblas from conda install
# --------------------------------------------------------
❯ python numpy_benchmark.py
Dotted two 4096x4096 matrices in 0.57 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 1.01 s.
Cholesky decomposition of a 2048x2048 matrix in 0.09 s.
Eigendecomposition of a 2048x2048 matrix in 7.20 s.


# Option 2: brew openblas + local compile
# ----------------------------------------
❯ python numpy_benchmark.py
Dotted two 4096x4096 matrices in 0.55 s.
Dotted two vectors of length 524288 in 0.25 ms.
SVD of a 2048x1024 matrix in 2.35 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 8.41 s.

So what I am wondering is if this whole issue has already been resolved (it seems so) and a simply conda install suffices. I guess I just want to make sure I get all the juice I can from the M1!

Cheers

RoyiAvital commented Nov 4, 2022 •

edited

Loading

@placeless , I tried your code, while it installed some accelerate related files I can't replicate the results above.
Anyway to verify my Numpy use Accelerate?

Could it be that it is now disabled since the issues and bugs reported above?

Related issue: conda-forge/numpy-feedstock#253.

placeless commented Nov 5, 2022 •

edited

Loading

@d13g0 This is what I got on a m1 max chip with numpy 1.23.4, python 3.10:

Dotted two 4096x4096 matrices in 0.29 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.43 s.
Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
Eigendecomposition of a 2048x2048 matrix in 3.93 s.

@RoyiAvital Here are my *blas libs, you can see a _accelerate suffix:

# micromamba list | grep blas

libblas                        3.9.0         15_osxarm64_accelerate  conda-forge
libcblas                       3.9.0         15_osxarm64_accelerate  conda-forge

I always update numpy separately and keep an eye on *blas modifications from other libs. That's painful.

edit

I just found a way to keep the BLAS implementation when updating

# add the following line to ~/micromamba/envs/<YOUR_ENV_NAME>/conda-meta/pinned

libblas=*=*accelerate

d13g0 commented Nov 6, 2022 via email

interesting. if i wanted to replicate this, can you share the install instruction for numpy that you used? Also if you run numpy.show_config what do you get? openblas? so the question is i know you have blas but is numpy actually using it? (so it should show in show_config) thanks! 🙏

…

On Nov 5, 2022, at 12:07 PM, placeless ***@***.***> wrote: ***@***.*** commented on this gist. @d13g0 This is what I got on a m1 max chip with numpy 1.23.4, python 3.10: Dotted two 4096x4096 matrices in 0.29 s. Dotted two vectors of length 524288 in 0.11 ms. SVD of a 2048x1024 matrix in 0.43 s. Cholesky decomposition of a 2048x2048 matrix in 0.07 s. Eigendecomposition of a 2048x2048 matrix in 3.93 s. @RoyiAvital Here are my *blas libs, you can see a _accelerate suffix: # micromamba list | grep blas libblas 3.9.0 15_osxarm64_accelerate conda-forge libcblas 3.9.0 15_osxarm64_accelerate conda-forge I always update numpy separately and keep an eye on *blas modifications from other libs. That's painful. — Reply to this email directly, view it on GitHub or unsubscribe. You are receiving this email because you were mentioned. Triage notifications on the go with GitHub Mobile for iOS or Android.

placeless commented Nov 6, 2022

@d13g0 ☝️Here is how I installed numpy with Apple's accelerate framework.

And the output of the np.__config__.show() is as follows:

blas_info:
    libraries = ['cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['~/micromamba/envs/<ENV_NAMA>/lib']
    include_dirs = ['~/micromamba/envs/<ENV_NAMA>/include']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
    libraries = ['cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['~/micromamba/envs/<ENV_NAMA>/lib']
    include_dirs = ['~/micromamba/envs/<ENV_NAMA>/include']
    language = c
lapack_info:
    libraries = ['lapack', 'blas', 'lapack', 'blas']
    library_dirs = ['~/micromamba/envs/<ENV_NAMA>/lib']
    language = f77
lapack_opt_info:
    libraries = ['lapack', 'blas', 'lapack', 'blas', 'cblas', 'blas', 'cblas', 'blas']
    library_dirs = ['~/micromamba/envs/<ENV_NAMA>/lib']
    language = c
    define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)]
    include_dirs = ['~/micromamba/envs/<ENV_NAMA>/include']
Supported SIMD extensions in this NumPy install:
    baseline = NEON,NEON_FP16,NEON_VFPV4,ASIMD
    found = ASIMDHP,ASIMDDP
    not found = ASIMDFHM

RoyiAvital commented Nov 6, 2022

@placeless , Can you try it with Python 3.11?

Also, I don't get it. The Numpy team said it won't support Accelerate as in https://gist.github.com/MarkDana/a9481b8134cf38a556cf23e1e815dafb?permalink_comment_id=4311279#gistcomment-4311279. How come it suddenly works?

placeless commented Nov 6, 2022

@RoyiAvital

mkdir py311
cd py311
vim config.yml

name: py311
channels:
  - conda-forge
dependencies:
  - python=3.11
  - libblas=*=*accelerate
  - numpy

micromamba create -f config.yml

  Updating specs:

   - python=3.11
   - libblas=*[build=*accelerate]
   - numpy


  Package              Version  Build                   Channel                     Size
──────────────────────────────────────────────────────────────────────────────────────────
  Install:
──────────────────────────────────────────────────────────────────────────────────────────

  + bzip2                1.0.8  h3422bc3_4              conda-forge/osx-arm64     Cached
  + ca-certificates  2022.9.24  h4653dfc_0              conda-forge/osx-arm64     Cached
  + libblas              3.9.0  16_osxarm64_accelerate  conda-forge/osx-arm64        3MB
  + libcblas             3.9.0  16_osxarm64_accelerate  conda-forge/osx-arm64       13kB
  + libcxx              14.0.6  h2692d47_0              conda-forge/osx-arm64     Cached
  + libffi               3.4.2  h3422bc3_5              conda-forge/osx-arm64     Cached
  + libgfortran          5.0.0  11_3_0_hd922786_25      conda-forge/osx-arm64     Cached
  + libgfortran5        11.3.0  hdaf2cc0_25             conda-forge/osx-arm64     Cached
  + liblapack            3.9.0  16_osxarm64_accelerate  conda-forge/osx-arm64       13kB
  + libsqlite           3.39.4  h76d750c_0              conda-forge/osx-arm64     Cached
  + libzlib             1.2.13  h03a7124_4              conda-forge/osx-arm64     Cached
  + llvm-openmp         14.0.4  hd125106_0              conda-forge/osx-arm64     Cached
  + ncurses                6.3  h07bb92c_1              conda-forge/osx-arm64     Cached
  + numpy               1.23.4  py311ha92fb03_1         conda-forge/osx-arm64        7MB
  + openssl              3.0.7  h03a7124_0              conda-forge/osx-arm64     Cached
  + python              3.11.0  h93c2e33_0_cpython      conda-forge/osx-arm64       16MB
  + python_abi            3.11  2_cp311                 conda-forge/osx-arm64        5kB
  + readline             8.1.2  h46ed386_0              conda-forge/osx-arm64     Cached
  + tk                  8.6.12  he1e0b03_0              conda-forge/osx-arm64     Cached
  + tzdata               2022f  h191b570_0              conda-forge/noarch        Cached
  + xz                   5.2.6  h57fd34a_0              conda-forge/osx-arm64     Cached

  Summary:

  Install: 21 packages

python bench.py

Dotted two 4096x4096 matrices in 0.29 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.42 s.
Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
Eigendecomposition of a 2048x2048 matrix in 4.06 s.

RoyiAvital commented Nov 7, 2022 •

edited

Loading

OK, I got:

Dotted two 4096x4096 matrices in 0.29 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.44 s.
Cholesky decomposition of a 2048x2048 matrix in 0.08 s.
Eigendecomposition of a 2048x2048 matrix in 3.90 s.

Which is similar to yours. The question is, is that better than the default?

@placeless , Do you have numbers for OpenBLAS?

alexshmmy commented Nov 12, 2022 •

edited

Loading

Thank you for the tips. I followed and in my freshly new MAC M1 MAX i did the following:

I installed the Minoforge3 (bash Miniforge3-MacOSX-arm64.sh)
Initialized a conda base environment (conda init) with Python 3.10
Installed numpy as:
conda install numpy "libblas=*=*accelerate"

And then the suggested benchmarks:

The script mysvd.py runs in mean of 10 runs: 1.08088s
The script dario.py gives:

Dotted two 4096x4096 matrices in 0.28 s.
Dotted two vectors of length 524288 in 0.11 ms.
SVD of a 2048x1024 matrix in 0.44 s.
Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
Eigendecomposition of a 2048x2048 matrix in 3.83 s.

TOTAL TIME = 19 seconds

alexshmmy commented Nov 12, 2022 •

edited

Loading

@placeless @RoyiAvital @MarkDana what about the related packages to numpy, i.e., scipy, pandas, scikit-learn? Do they need also specific installation with conda and accelerated "libblas=*=*accelerate" for them to work efficiently?

placeless commented Nov 14, 2022

@RoyiAvital , openblas results:

Dotted two 4096x4096 matrices in 0.38 s.
Dotted two vectors of length 524288 in 0.07 ms.
SVD of a 2048x1024 matrix in 1.67 s.
Cholesky decomposition of a 2048x2048 matrix in 0.07 s.
Eigendecomposition of a 2048x2048 matrix in 9.43 s.

@alexshmmy , I've never heard of this switch on scipy/pandas/scikit, installing numpy first would be a good choice, I think.

alexshmmy commented Nov 14, 2022

Thanks @placeless! After conda install numpy "libblas=*=*accelerate" i have now installed int he same environment scipy and pandas. Let me know if there is any benchmark that i can test scipy, pandas also if they work efficiently.

vlebert commented Nov 18, 2022

So, when can we hope a simple conda install numpy do the job for M1 chips ?
Do you know what is blocking?

QueryType commented Feb 2, 2023

As of Jan 2023, is it possible to install numpy natively on a M1 chip mini mac and get it to use the GPU? I am curious since I plan to purchase one and use it for vector maths and machine learning alogs. Thanks.

maguzj commented Feb 4, 2023

I've just followed the steps on an M1 machine and it worked perfectly: my code runs 60 times faster.
I tried the same on an M2 machine and works a little bit slower: x20 improvement.

Any ideas on how to translate/update this info for M2 MacBook?

fmigas commented Sep 19, 2023

It looks like pip install --no-binary :all: --no-use-pep517 numpy does not work anymore.
It returns an error:
ERROR: Disabling PEP 517 processing is invalid: project specifies a build backend of mesonpy in pyproject.toml

What can be done to repair it?

by-justin commented Oct 5, 2023

It looks like pip install --no-binary :all: --no-use-pep517 numpy does not work anymore. It returns an error: ERROR: Disabling PEP 517 processing is invalid: project specifies a build backend of mesonpy in pyproject.toml

What can be done to repair it?

Same issue here.

CoryKornowicz commented Oct 28, 2023

You can omit the --no-use-pep517 flag altogether, which should still work.

fmigas commented Oct 28, 2023

It looks like the solution is very simple. Numpy 1.26 does not accept this argument, but numpy 1.25.2 does.
You need to add ==1.25.2 at the end and it will work smoothly.

vlebert commented Oct 28, 2023

Check this

numpy/numpy#24961 (comment)

MarkDana/m1-max-numpy-setup.md

Steps

I. Install miniforge

II. Install Numpy with BLAS interface specified as vecLib

III. For further installing other packages using conda

Comparisons to other installations:

1. Competitors:

2. Benchmarks:

3. Results:

d13g0 commented Sep 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

RoyiAvital commented Nov 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

placeless commented Nov 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

edit

Uh oh!

d13g0 commented Nov 6, 2022 via email

Uh oh!

placeless commented Nov 6, 2022

Uh oh!

RoyiAvital commented Nov 6, 2022

Uh oh!

placeless commented Nov 6, 2022

Uh oh!

RoyiAvital commented Nov 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexshmmy commented Nov 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexshmmy commented Nov 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

placeless commented Nov 14, 2022

Uh oh!

alexshmmy commented Nov 14, 2022

Uh oh!

vlebert commented Nov 18, 2022

Uh oh!

QueryType commented Feb 2, 2023

Uh oh!

maguzj commented Feb 4, 2023

Uh oh!

fmigas commented Sep 19, 2023

Uh oh!

by-justin commented Oct 5, 2023

Uh oh!

CoryKornowicz commented Oct 28, 2023

Uh oh!

fmigas commented Oct 28, 2023

Uh oh!

vlebert commented Oct 28, 2023

Uh oh!

d13g0 commented Sep 22, 2022 •

edited

Loading

RoyiAvital commented Nov 4, 2022 •

edited

Loading

placeless commented Nov 5, 2022 •

edited

Loading

RoyiAvital commented Nov 7, 2022 •

edited

Loading

alexshmmy commented Nov 12, 2022 •

edited

Loading

alexshmmy commented Nov 12, 2022 •

edited

Loading