These instructions are based on Mistobaan's gist but expanded and updated to work with the latest tensorflow OSX CUDA PR.
I tested these intructions on OS X v10.10.5. They will probably work on OS X v10.11 (El Capitan), too.
These instructions assume you have Xcode installed and your machine is already set up to compile c/c++ code.
If not, simply type gcc
into a terminal and it will prompt you to download and
install the Xcode Command-Line Tools.
To compile tensorflow on OS X, you need several dependent libraries. The easiest way to get them is to install them with the homebrew package manager.
If you don't already have brew
installed, you can install it like this:
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
If you don't want to blindly run a ruby script loaded from the internet, they have alternate install options.
First, make sure you have brew
up to date with the latest available packages:
brew update
brew upgrade
Then install these tools:
brew install coreutils
brew install swig
brew install bazel
Check the version to make sure you installed bazel 0.1.4 or greater. bazel 0.1.3 or below will fail when building tensorflow.
$ bazel version
Build label: 0.1.4-homebrew
Also installed from brew
:
brew cask install cuda
Check the version to make sure you installed CUDA 7.5. Older versions will fail.
$ brew cask info cuda
cuda: 7.5.20
Nvidia CUDA
NVIDIA requires you to sign up and be approved before you can download this.
First, go sign up here:
https://developer.nvidia.com/accelerated-computing-developer
When you sign up, make sure you provide accurate information. A human at NVIDIA will review your application. If it's a business day, hopefully you'll get approved quickly.
Then go here to download cuDNN:
https://developer.nvidia.com/cudnn
Click 'Download' to fill out their survey and agree to their Terms. Finally, you'll see the download options.
However, you'll only see download options for cuDNN v4 and cuDNN v3. You'll want to scroll to the very bottom and click "Archived cuDNN Releases".
This will take you to this page where you can download cuDNN v2:
https://developer.nvidia.com/rdp/cudnn-archive
On that page, download "cuDNN v2 Library for OSX".
Next, tou need to manually install it by copying over some files:
tar zxvf ~/Downloads/cudnn-6.5-osx-v2.tar.gz
sudo cp ./cudnn-6.5-osx-v2/cudnn.h /usr/local/cuda/include/
sudo cp ./cudnn-6.5-osx-v2/libcudnn* /usr/local/cuda/lib/
Finally, you need to make sure the library is in your library load path.
Edit your ~/.bash_profile
file and add this line at the bottom:
export DYLD_LIBRARY_PATH="/usr/local/cuda/lib":$DYLD_LIBRARY_PATH
After that, close and reopen your terminal window to apply the change.
Since OS X CUDA support is still an unmerged pull request (#664), you need to check out that specific branch:
git clone --recurse-submodules https://github.com/tensorflow/tensorflow
cd tensorflow
git fetch origin pull/664/head:cuda_osx
git checkout cuda_osx
Before you start, open up System Report in OSX:
Apple Menu > About this Mac > System Report...
In System Report, click on "Graphics/Displays" and find out the exact model NVIDIA card you have:
NVIDIA GeForce GT 650M:
Chipset Model: NVIDIA GeForce GT 650M
Then go to https://developer.nvidia.com/cuda-gpus and find that exact model name in the list:
CUDA-Enabled GeForce Products > GeForce GT 650M
There it will list the Compute Capability for your card. For the GeForce GT 650M
used in late 2011 Macbook Pro Retinas, it is 3.0
. Write this down as it's
critical to have this number for the next step.
You will first need to configure the tensorflow build options:
TF_UNOFFICIAL_SETTING=1 ./configure
During the config process, it will ask you a bunch of questions. You can use the answers below except make sure to use the Compute Capability for your NVIDIA card you looked up in the previous step:
WARNING: You are configuring unofficial settings in TensorFlow. Because some external libraries are not backward compatible, these settings are largely untested and unsupported.
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify the Cuda SDK version you want to use. [Default is 7.0]: 7.5
Please specify the location where CUDA 7.5 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Default is 6.5]:
Please specify the location where cuDNN 6.5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.0
Setting up Cuda include
Setting up Cuda lib
Setting up Cuda bin
Setting up Cuda nvvm
Configuration finished
Now you can actually build and install tensorflow!
bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
pip install /tmp/tensorflow_pkg/tensorflow-0.6.0-py2-none-any.whl
You need to exit the tensorflow build folder to test your installation.
cd ~
Now, run python
and paste in this test script:
import tensorflow as tf
# Creates a graph.
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
print sess.run(c)
You should get output that looks something like this:
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.6.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.dylib locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.7.5.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:892] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_init.cc:102] Found device 0 with properties:
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 452.21MiB
I tensorflow/core/common_runtime/gpu/gpu_init.cc:126] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_init.cc:136] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:705] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 512.0KiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 1.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 2.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 4.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 8.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 16.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 32.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 64.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 128.00MiB
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:51] Creating bin of max chunk size 256.00MiB
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0
I tensorflow/core/common_runtime/direct_session.cc:142] Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0
b: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:304] b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:304] a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/simple_placer.cc:304] MatMul: /job:localhost/replica:0/task:0/gpu:0
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:73] Allocating 252.21MiB bytes.
I tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:83] GPU 0 memory begins at 0x700a80000 extends to 0x7106b6000
[[ 22. 28.]
[ 49. 64.]]
Yay! Now you can train your models using a GPU!
If you are using a Retina Macbook Pro with only a 1GB GeForce 650M, you will probably run into Out of Memory errors with medium to large models. But at least it will make small-scale experimentation faster.