Largely based on the Tensorflow 1.6 gist, and Tensorflow 1.7 gist for xcode and Tensorflow 1.7 gist for eGPU, this should hopefully simplify things a bit.
- NVIDIA Web-Drivers 387.10.10.10.30.106 for 10.13.4 (17E199) (w/o Security Update)
- CUDA-Drivers 387.128
- CUDA 9.1 Toolkit
- cuDNN 7.0.5 (latest for macOS)
- NCCL 2.1.15 (latest for macOS)
- Python 2.7
- XCode 8.2
- bazel stable 0.13.0 (latest on HomeBrew)
- Tensorflow 1.8 Source Code
If you don't know how to setup eGPU on Mac checkout these step. Make sure you have eGPU working before installation. (You sould see your specific graphic card name in Apple > About this Mac > System Report ... > Graphics/Displays)
The rest steps are the same as normal GPU setup.
If you are like me using MacBook Pro (15-inch, 2016) runing 10.13.4 (17E199) and eGPU: NVIDIA GeForce GTX 1080 Ti 11 GiB (or any 6.1 compatible version in nvidia page).
You could, at your own risk, skip the Prepare
and Compile
steps below,
download .whl from here and install it:
pip install tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl
And be sure to test after installation. But remember this is not safe.
For package management, ignore if you have your own python
, wget
or you want to download manually.
/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
brew install wget
Download and install from http://www.nvidia.com/download/driverResults.aspx/130460/en-us
Download and install from http://www.nvidia.com/object/macosx-cuda-387.178-driver.html
Download and from XCode_8.2.xip.
Or Find XCode 8.2
on https://developer.apple.com/download/more/
Unarchive and rename XCode.app
to Xcode8.2.app
in case you want to build and use it next time.
If you have Homebrew installed
brew install bazel
or Download the binary here
chmod 755 bazel-0.10.0-installer-darwin-x86_64.sh
./bazel-0.10.0-installer-darwin-x86_64.sh
It should be something along the lines of cuda_9.1.128_mac.dmg
Download NCCL 2.1.15 O/S agnostic and CUDA 9
from NVdia.
Unarchive it and move to a permanant place e.g. /usr/local/nccl
.
sudo mkdir -p /usr/local/nccl
cd nccl_2.1.15-1+cuda9.1_x86_64
sudo mv * /usr/local/nccl
sudo mkdir -p /usr/local/include/third_party/nccl
sudo ln -s /usr/local/nccl/include/nccl.h /usr/local/include/third_party/nccl
Edit ~/.bash_profile
and add the following:
export CUDA_HOME=/usr/local/cuda
export DYLD_LIBRARY_PATH=/usr/local/cuda/lib:/usr/local/cuda/extras/CUPTI/lib
export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
export PATH=$DYLD_LIBRARY_PATH:$PATH:/Developer/NVIDIA/CUDA-9.1/bin
We want to compile some CUDA sample to check if the GPU is correctly recognized and supported.
cd /Developer/NVIDIA/CUDA-9.1/samples
chown -R $(whoami) *
make -C 1_Utilities/deviceQuery
./bin/x86_64/darwin/release/deviceQuery
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1080 Ti"
CUDA Driver Version / Runtime Version 9.1 / 9.1
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 11264 MBytes (11810963456 bytes)
(28) Multiprocessors, (128) CUDA Cores/MP: 3584 CUDA Cores
GPU Max Clock rate: 1645 MHz (1.64 GHz)
Memory Clock rate: 5505 Mhz
Memory Bus Width: 352-bit
L2 Cache Size: 2883584 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 196 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.1, CUDA Runtime Version = 9.1, NumDevs = 1
Result = PASS
If not already done, register at https://developer.nvidia.com/cudnn Download cuDNN 7.0.5
Change into your download directory and follow the post installation steps.
tar -xzvf cudnn-9.1-osx-x64-v7-ga.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib/libcudnn* /usr/local/cuda/lib
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib/libcudnn*
Skip if you have your own idea of which python/pip to use:
$ which python
/usr/local/bin/python
$ which pip
/usr/local/bin/pip
Or Download get-pip and run it in python. More info here
python get-pip.py
pip will automatically install the tensorflow dependencies (wheel, six etc), if don't you could install them manually.
cd /tmp
git clone https://github.com/tensorflow/tensorflow
cd tensorflow
git checkout v1.8.0
Apply the following patch to fix a couple build issues:
wget https://gist.githubusercontent.com/Willian-Zhang/a3bd10da2d8b343875f3862b2a62eb3b/raw/xtensorflow18macos.patch
git apply xtensorflow18macos.patch
Except CUDA support, CUDA SDK version and Cuda compute capabilities, I left the other settings untouched.
Pay attension to Cuda compute capabilities
, you might want to find your own according to guide.
./configure
You have bazel 0.10.0 installed.
Please specify the location of python. [Default is /usr/bin/python]:
Found possible Python library paths:
/Library/Python/2.7/site-packages
Please input the desired Python library path to use. Default is [/Library/Python/2.7/site-packages]
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]:
No Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]:
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]:
No Amazon S3 File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Apache Kafka Platform support? [y/N]:
No Apache Kafka Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]:
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]:
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]:
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]:
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 9.0]: 9.1
Please specify the location where CUDA 9.1 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]:
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2] (type your own, check on https://developer.nvidia.com/cuda-gpus, mine is 6.1 for GTX 1080 Ti)
Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]:
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
Configuration finished
Takes about 47 minutes on my machine.
bazel clean
bazel build --config=cuda --config=opt --action_env PATH --action_env LD_LIBRARY_PATH --action_env DYLD_LIBRARY_PATH //tensorflow/tools/pip_package:build_pip_package
bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
ls /tmp/tensorflow_pkg
tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl
If you want to use virtualenv or something, now is the time. Or just:
pip install /tmp/tensorflow_pkg/tensorflow-1.8.0-cp36-cp36m-macosx_10_13_x86_64.whl
Files in /tmp
would be cleaned after reboot.
cp /tmp/tensorflow_pkg/*.whl ~/
It's useful to leave the .whl file lying around in case you want to install it for another environment.
See if everything got linked correctly
cd ~
python
>>> import tensorflow as tf
>>> tf.Session()
2018-04-08 03:25:15.740635: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-04-08 03:25:15.741260: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1344] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:c4:00.0
totalMemory: 11.00GiB freeMemory: 10.18GiB
2018-04-08 03:25:15.741288: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1423] Adding visible gpu devices: 0
2018-04-08 03:25:16.157590: I tensorflow/core/common_runtime/gpu/gpu_device.cc:911] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-04-08 03:25:16.157614: I tensorflow/core/common_runtime/gpu/gpu_device.cc:917] 0
2018-04-08 03:25:16.157620: I tensorflow/core/common_runtime/gpu/gpu_device.cc:930] 0: N
2018-04-08 03:25:16.157753: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1041] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9849 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1)
<tensorflow.python.client.session.Session object at 0x10968ef60>
python
import tensorflow as tf
tf.enable_eager_execution()
tf.executing_eagerly() # => True
x = [[2.]]
m = tf.matmul(x, x)
print("hello, {}".format(m)) # => "hello, [[4.]]"
pip install keras
wget https://gist.githubusercontent.com/Willian-Zhang/290dceb96679c8f413e42491c92722b0/raw/mnist-cnn.py
python mnist_cnn.py
/usr/local/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
from ._conv import register_converters as _register_converters
Using TensorFlow backend.
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples
Train on 60000 samples, validate on 10000 samples
Epoch 1/12
2018-05-11 04:51:10.335377: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:859] OS X does not support NUMA - returning NUMA node zero
2018-05-11 04:51:10.336052: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:c4:00.0
totalMemory: 11.00GiB freeMemory: 9.37GiB
2018-05-11 04:51:10.336075: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2018-05-11 04:51:11.063831: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2018-05-11 04:51:11.063856: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0
2018-05-11 04:51:11.063864: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N
2018-05-11 04:51:11.064768: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9065 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:c4:00.0, compute capability: 6.1)
2018-05-11 04:51:11.534095: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2018-05-11 04:51:11.579370: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
2018-05-11 04:51:11.644835: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
59264/60000 [============================>.] - ETA: 0s - loss: 0.2604 - acc: 0.92082018-05-11 04:51:19.228205: E tensorflow/core/grappler/clusters/utils.cc:127] Not found: TF GPU device with id 0 was not registered
60000/60000 [==============================] - 10s 159us/step - loss: 0.2588 - acc: 0.9213 - val_loss: 0.0561 - val_acc: 0.9829
Epoch 2/12
60000/60000 [==============================] - 4s 66us/step - loss: 0.0875 - acc: 0.9742 - val_loss: 0.0427 - val_acc: 0.9857
Epoch 3/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0662 - acc: 0.9803 - val_loss: 0.0356 - val_acc: 0.9875
Epoch 4/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0549 - acc: 0.9839 - val_loss: 0.0325 - val_acc: 0.9896
Epoch 5/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0471 - acc: 0.9859 - val_loss: 0.0309 - val_acc: 0.9901
Epoch 6/12
60000/60000 [==============================] - 4s 68us/step - loss: 0.0421 - acc: 0.9873 - val_loss: 0.0297 - val_acc: 0.9903
Epoch 7/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0377 - acc: 0.9884 - val_loss: 0.0259 - val_acc: 0.9908
Epoch 8/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0357 - acc: 0.9883 - val_loss: 0.0285 - val_acc: 0.9908
Epoch 9/12
60000/60000 [==============================] - 4s 68us/step - loss: 0.0315 - acc: 0.9904 - val_loss: 0.0327 - val_acc: 0.9901
Epoch 10/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0288 - acc: 0.9910 - val_loss: 0.0272 - val_acc: 0.9911
Epoch 11/12
60000/60000 [==============================] - 4s 67us/step - loss: 0.0282 - acc: 0.9912 - val_loss: 0.0248 - val_acc: 0.9920
Epoch 12/12
60000/60000 [==============================] - 4s 66us/step - loss: 0.0255 - acc: 0.9923 - val_loss: 0.0283 - val_acc: 0.9912
Test loss: 0.028254894825743667
Test accuracy: 0.9912
You can use cuda-smi to watch the GPU memory usages. In case the of the mnist example in keras, you should see the free memory drop down to maybe 2% and the fans spin up. Not quite sure what the grappler/clusters/utils.cc:127 warning is, however.
$ cuda-smi
Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 10350 of 11264 MB (i.e. 91.9%) Free
# when GPU
$ cuda-smi
Device 0 [PCIe 0:196:0.0]: GeForce GTX 1080 Ti (CC 6.1): 1181.1 of 11264 MB (i.e. 10.5%) Free
Tested on a MacBook Pro (15-inch, 2016) 10.13.4 (17E199) 2.7 GHz Intel Core i7 and NVIDIA GeForce GTX 1080 Ti 11 GiB