Largely based on the Tensorflow 1.6 gist, this should hopefully simplify things a bit. Mixing homebrew python2/python3 with pip ends up being a mess, so here's an approach to uses the built-in python27.
- NVIDIA Web-Drivers 378.05.05 for 10.12.6
- CUDA 9.0 Toolkit
- cuDNN 7.0.5 (latest release for mac os)
- Python 3.6
- XCode 8.3.2
- bazel 0.10.0 (0.12.0 not works in my environment)
- Tensorflow 1.7
Download and install from http://www.nvidia.com/download/
I was able to compile all of it on XCode9, but tensorflow promptly segfaults if you actually try to do anything on the gpu. You may need a developer account to grab the old version https://developer.apple.com/download/more/
If you have newer Xcode installed, rename the XCode.app to something like Xcode9.app Unpack XCode 8.3.2 and switch the tool chain over to it:
sudo xcode-select -s /Applications/Xcode.app
Download the binary here
chmod 755 bazel-0.10.0-installer-darwin-x86_64.sh
./bazel-0.10.0-installer-darwin-x86_64.sh
It should be something along the lines of cuda_9.0.176_mac.dmg
Edit ~/.bash_profile and add the following:
export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib
export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
export PATH=$DYLD_LIBRARY_PATH:$PATH:/Developer/NVIDIA/CUDA-9.0/bin
We want to compile some CUDA sample to check if the GPU is correctly recognized and supported.
cd /Developer/NVIDIA/CUDA-9.0/samples
chown -R YOURUSERNAMEHERE *
make -C 1_Utilities/deviceQuery
./Developer/NVIDIA/CUDA-9.0/samples/bin/x86_64/darwin/release/deviceQuery
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1060 6GB"
CUDA Driver Version / Runtime Version 9.0 / 9.0
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6144 MBytes (6442254336 bytes)
(10) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1709 MHz (1.71 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 195 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
If not already done, register at https://developer.nvidia.com/cudnn Download cuDNN 7.0.5
Change into your download directory and follow the post installation steps.
tar -xzvf cudnn-9.0-osx-x64-v7-ga.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout -b 1.7 v1.7.0
Apply the following patch to fix a couple build issues:
git apply xtensorflow17macos.patch
Except CUDA support, CUDA SDK version and Cuda compute capabilities, I left the other settings untouched.
./configure
You have bazel 0.10.0 installed.
Please specify the location of python. [Default is /Users/hongta/anaconda3/bin/python]:
Found possible Python library paths:
/Users/hongta/anaconda3/lib/python3.6/site-packages
Please input the desired Python library path to use. Default is [/Users/hongta/anaconda3/lib/python3.6/site-packages]
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: y
Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: y
Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: y
Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]: n
No Apache Kafka Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: y
XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]6.1
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
Configuration finished
Takes about 20-160 minutes on my machine
bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package ~/
pip install ~/tensorflow-1.7.0-cp36-cp36m-macosx_10_7_x86_64.whl
It's useful to leave the .whl file lying around in case you want to install it for another environment.
See if everything got linked correctly .
cd ~
python
>>> import tensorflow as tf
>>> tf.Session()
2018-04-15 18:26:22.166565: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-04-15 18:26:22.166762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 8.74GiB
2018-04-15 18:26:22.386106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-04-15 18:26:22.386245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.721
pciBusID: 0000:02:00.0
totalMemory: 11.00GiB freeMemory: 10.79GiB
2018-04-15 18:26:22.397532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0, 1
2018-04-15 18:26:23.133126: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-15 18:26:23.133155: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0 1
2018-04-15 18:26:23.133161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N N
2018-04-15 18:26:23.133165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 1: N N
2018-04-15 18:26:23.149619: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8452 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2018-04-15 18:26:23.359183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:1 with 10441 MB memory) -> physical GPU (device: 1, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0, compute capability: 6.1)
<tensorflow.python.client.session.Session object at 0x101853940>
pip install keras
git clone https://github.com/fchollet/keras.git
cd keras/examples
python mnist_cnn.py
Using TensorFlow backend.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-04-05 22:38:30.156464: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-04-05 22:38:30.156645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.392
pciBusID: 0000:05:00.0
totalMemory: 4.00GiB freeMemory: 2.98GiB
2018-04-05 22:38:30.156672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-05 22:38:30.519346: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-05 22:38:30.519376: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-04-05 22:38:30.519383: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-04-05 22:38:30.519499: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2697 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:05:00.0, compute capability: 6.1)
2018-04-05 22:38:30.649987: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2018-04-05 22:38:30.693399: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2018-04-05 22:38:30.761824: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
59648/60000 [============================>.] - ETA: 0s - loss: 0.2698 - acc: 0.91682018-04-05 22:38:42.071923: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
You can use cuda-smi to watch the GPU memory usages. In case the of the mnist example in keras, you should see the free memory drop down to maybe 2% and the fans spin up. Not quite sure what the grappler/clusters/utils.cc:127 warning is, however.
pmalik@MacPro:~/cuda-smi$ ./cuda-smi
Device 0 [PCIe 0:5:0.0]: GeForce GTX 1050 Ti (CC 6.1): 2901.6 of 4095.8 MB (i.e. 70.8%) Free
pmalik@MacPro:~/cuda-smi$ ./cuda-smi
Device 0 [PCIe 0:5:0.0]: GeForce GTX 1050 Ti (CC 6.1): 2893.1 of 4095.8 MB (i.e. 70.6%) Free
pmalik@MacPro:~/cuda-smi$ ./cuda-smi
Device 0 [PCIe 0:5:0.0]: GeForce GTX 1050 Ti (CC 6.1): 223.86 of 4095.8 MB (i.e. 5.47%) Free
pmalik@MacPro:~/cuda-smi$ ./cuda-smi
Device 0 [PCIe 0:5:0.0]: GeForce GTX 1050 Ti (CC 6.1): 97.852 of 4095.8 MB (i.e. 2.39%) Free
Tested on a 2010 Mac Pro (Mid 2010) 10.13.3 (17D47) 2 x 2.93 GHz 6-Core Intel Xeon and NVIDIA GeForce GTX 1050 Ti 4 GB
If you'd like to build tensorflow with openmp (multi-cpu support), grab the open mp library via homebrew
brew install cliutils/apple/libomp
and uncomment the -lgomp line /third_party/gpus/cuda/BUILD.tpl
Also you can build the binary to your specific cpu architecure, run this to get a list
bazel build --config=cuda --config=opt --copt=-march=native --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package
You can run this command to see what instruction sets are getting built
echo | clang -E - -march=native -###