These instructions will explain how to install tensorflow on mac with cuda enabled GPU suport. I assume you know what tensorflow is and why you would want to have a deep learning framework running on your computer.
Make sure to update your homebrew formulas
brew update
Coreutils for Macosx.
brew install coreutils swig
Cuda Libraries for macosx. You can install cuda from homebrew using cask.
brew cask install cuda
Make sure that the installed cuda version is 7.5
you can check the version with
brew cask info cuda
cuda: 7.5.20
Nvidia CUDA
https://developer.nvidia.com/cuda-zone
Not installed
https://github.com/caskroom/homebrew-cask/blob/master/Casks/cuda.rb
No Artifact Info
If you don't see 7.5 make sure to upgrade your brew formulas:
brew update
brew upgrade cuda
You need NVIDIA's Cuda Neural Network library libCudnn. You have to register and download it from the website: https://developer.nvidia.com/cudnn.
(Note: from version 0.8 Tensorflow supports cuDNN v5, version 0.7 and 0.7.1 support v4)
Download the file cudnn-7.5-osx-x64-v5.0-rc.tgz
Once downloaded you need to manually copy the files over the /usr/local/cuda/
directory
tar xzvf ~/Downloads/cudnn-7.5-osx-x64-v5.0-rc.tgz
sudo mv -v cuda/lib/libcudnn* /usr/local/cuda/lib
sudo mv -v cuda/include/cudnn.h /usr/local/cuda/include
add in your ~/.bash_profile
the reference to /usr/local/cuda/lib
. You will need it to run the python scripts.
export DYLD_LIBRARY_PATH=`/usr/local/cuda/lib`:$DYLD_LIBRARY_PATH
Now let's make sure that we are able to compile cuda programs. If you have the latest Xcode Installed (7.3 as the time of this post) nvcc will not work and will give an error like:
nvcc fatal : The version ('70300') of the host compiler ('Apple clang') is not supported
In order to fix this you need to:
- download Xcode 7.2 from the apple developer website
- create a new directory
/Applications/XCode7.2/
- copy the entire XCode.App inside
/Applications/XCode7.2
- run
sudo xcode-select -s /Applications/XCode7.2/Xcode.app/
You should be able to compile the deviceQuery utility found inside the cuda sdk repository. Let's compile the deviceQuery utility to figure out the CUDA_CAPABILITY supported by our graphics card.
cd /usr/local/cuda/samples
sudo make -C 1_Utilities/deviceQuery
And now we run it:
cd /usr/local/cuda/samples/
./bin/x86_64/darwin/release/deviceQuery
The output will look like:
./bin/x86_64/darwin/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GT 650M"
CUDA Driver Version / Runtime Version 7.5 / 7.5
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 1024 MBytes (1073414144 bytes)
( 2) Multiprocessors, (192) CUDA Cores/MP: 384 CUDA Cores
GPU Max Clock rate: 900 MHz (0.90 GHz)
Memory Clock rate: 2508 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 262144 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GT 650M
Result = PASS
Here you can confirm that the driver is set to 7.5 and you can find also the cuda capability of your GPU, CUDA Capability Major/Minor version number: 3.0
in my case, so we can set this property when we configure tensorflow.
Use homebrew:
brew install bazel
or install it manually from source:
git clone https://github.com/bazelbuild/bazel.git
cd bazel
git checkout tags/0.2.1
./compile.sh
sudo cp output/bazel /usr/local/bin
Make sure you have the right version of bazel, at least 0.2.1
$ bazel version
Build label: 0.2.1-homebrew
Build target: bazel-out/local_darwin-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Apr 1 00:35:17 2016 (1459470917)
Build timestamp: 1459470917
Build timestamp as int: 1459470917
As of end of April 2016 the build system is merged in the main development line!
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout master
Then we need to configure it.
I use Anaconda for the python distribution.
Notice that you need to set the right TF_CUDA_COMPUTE_CAPABILITES
value from the previous deviceQuery operation.
PYTHON_BIN_PATH=$HOME/anaconda/bin/python CUDA_TOOLKIT_PATH="/usr/local/cuda" CUDNN_INSTALL_PATH="/usr/local/cuda" TF_UNOFFICIAL_SETTING=1 TF_NEED_CUDA=1 TF_CUDA_COMPUTE_CAPABILITIES="3.0" TF_CUDNN_VERSION="5" TF_CUDA_VERSION="7.5" TF_CUDA_VERSION_TOOLKIT=7.5 ./configure
Now we are ready to build tensorflow pip package. This may take a while.
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install --upgrade /tmp/tensorflow_pkg/tensorflow-0.8.0rc0-py2-none-any.whl
if you are using anaconda like me you want to add `--ignore-installed``
pip install --upgrade --ignore-installed /tmp/tensorflow_pkg/tensorflow-0.8.0rc0-py2-none-any.whl
now move to another directory and run a test script:
import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)
You should see now the output from the sample program. If not check the Caveats section.
deeplearning$ python test_install.py
I tensorflow/stream_executor/dso_loader.cc:107] successfully opened CUDA library libcublas.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:107] successfully opened CUDA library libcudnn.6.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:107] successfully opened CUDA library libcufft.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:107] successfully opened CUDA library libcuda.dylib locally
I tensorflow/stream_executor/dso_loader.cc:107] successfully opened CUDA library libcurand.7.5.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:103] Found device 0 with properties:
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 19.40MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:127] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:137] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:703] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:42] Allocating 19.40MiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:52] GPU 0 memory begins at 0x700a80000 extends to 0x701de6000
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:66] Creating bin of max chunk size 32.00MiB
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0
I tensorflow/core/common_runtime/direct_session.cc:137] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0
b: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:304] b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:304] a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:304] MatMul: /job:localhost/replica:0/task:0/gpu:0
[[ 22. 28.]
[ 49. 64.]]
You can read more about using GPUs, in tensorflow in the official GPU article.
If you run into this error:
ImportError: No module named core.framework.graph_pb2
you are running the script from the same tensorflow directory and python is using the local directory as the module. Change directory see Stackoverflow Question.
If you get this error:
: Library not loaded: @rpath/libcudart.7.5.dylib
Referenced from: ~/anaconda/envs/tensorflow/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found
Is because python is not able to find the cuda library. make sure to set the environment variable.
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:$DYLD_LIBRARY_PATH
If you see
Ignoring gpu device (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
recompile the library and use the right compute capability (3.0).
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.0
Setting up Cuda include
Setting up Cuda lib
Setting up Cuda bin
Setting up Cuda nvvm
Configuration finished
Note that is is still a pull request. So is not officially supported.
I hope that with this tutorial more OSX developers can try the patch and report any errors and confirm that would be a good patch to merge in the main repository.
Stay hungry and have fun.
can anyone succeed on osx 10.12.1 and cuda8.0