Skip to content

Instantly share code, notes, and snippets.

@dandanwei
Last active February 16, 2023 10:22
Show Gist options
  • Save dandanwei/18708e7bd5fd2b227f86bca668343093 to your computer and use it in GitHub Desktop.
Save dandanwei/18708e7bd5fd2b227f86bca668343093 to your computer and use it in GitHub Desktop.
Make Nvidia EGPU working on mac os with Pytorch and Fast.ai

[Updated on 2018.11.14] I finally made my GTX1070 working with my MBP for Pytorch and fast.ai. Below are the steps:

Environment

  • MacBook Pro (15-inch, 2016) with touch bar
  • OSX version: 10.13.6 (Mojave may not work yet as of now)
  • eGPU: Razer Core X + GTX 1070 (MSI)

Steps 1: Install Nvidia Web Driver

10.13.6 + 17G65

Follow this if your system is 10.13.6 17G65 (* you can check this number by clicking "version 10.13.6" in "About this Mac"* ). If it is 17G3025 or later, jump to the next section "10.13.5 + 17G3025"

Use this great tool macOS-eGPU to install Nvidia web driver. Just follow the guide and install by "> macos-egpu".

Although it also provide the options to let you install CUDA, DO NOT use it. Because it will automatically install the latest version, which seems not working for Pytorch yet. So just install the NVIDIA web driver.

After the installation, my web driver version is: 387.10.10.10.40.105. Make sure you have this version if your OSX version is 10.13.6 + 17G65.

10.13.6 + 17G3025 [Added on 2018.11.14]

There comes a new security patch in High Sierra 10.13.6 17G3025 (* you can check this number by clicking "version 10.13.6" in "About this Mac"* ) in the beginning of November. The macOs-eGPU has not been updated for this new OSX build yet (as of today 2018.11.14). So I would suggest to use another tool instead: purge-wrangle. It is the same or better (personal opinion), just follow the guide and select "Enable NVIDIA eGPUs".

After the installation, the web driver version will be: 387.10.10.10.40.108.

10.13.6 + 17G5019 [Added on 2019.02.27]

Same as before, just use purge-wrangle to apply the patch. If you already patched the system with purge-wrangle before, simply upgrade the Nvidea web driver. After the restart, purge-wrangle will prompt you to re-patch the system. Easy as a pie.

After the installation, the web driver version will be: 387.10.10.10.40.118.

Verify

If it is installed successfully, once you plug in your eGPU, you shall see your GTX 1070 in "About This Mac -> System Report... -> Graphics/Displays" and "Activity Monitor -> Window -> GPU History". Or you can simply plug an external monitor to eGPU to see if it works.

NOTE: It doesn't support eGPU hot unplug yet. So it is suggested to "reboot and unplug the moment the eGPU power shuts down". (If it is not done properly, kernel panic will happen). But my Razer Core X will not shut down the power during the restart. The fan of the Razer Core X keeps spinning and probably because the GPU temperature is low the fan on GTX 1070 doesn't spin at all. So for me, there is no way to tell the right moment from the eGPU. But with some experiment, I found it seems safe to unplug at the moment that the keyboard backlight turns off during the restart.

Step 2: Install CUDA driver, toolkit

Pytorch works with CUDA 9.2. It doesn't support the latest CUDA 10.0 yet. So I downloaded the installation image from Nvidia. It includes CUDA driver, toolkit and samples. Just install all of them. We will need samples later on. CUDA Toolkit 9.2 has a patch, install the patch as well. You can download the patch from the same place as listed above.

Follow the installation guide here

Make sure the deviceQuery and bandwidthTest from samples work after installation.

After the installation, my CUDA driver version is: ** 396.148 **. You can get this information with the command "> macos-egpu -C".

Step 3: Install CUDNN

Get into this page to download the installation image (require registration). "https://developer.nvidia.com/rdp/cudnn-archive" -> click "cuDNN v7.1.4 Library for OSX".

Make sure to use cuDNN v7.1.4.

Follow this guide for the installation.

Step 4: Compile and install Pytorch

I followed this guide. It mostly correct as for me, but not all... So I would like to write down the steps that works for me.

  • Create conda enviroment
conda create --name ptc python=3.6 pip
  • With ptc active (> source activate ptc)
export CMAKE_PREFIX_PATH=[anaconda root directory]
# for me, the anaconda root directory is "/Users/<your_user_name>/anaconda3"

#Install optional dependencies
conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing
  • After the above step, unset CMAKE_PREFIX_PATH or simply open a new terminal and activate ptc. This is very important. Becuase we are going to compile pytorch, with CMAKE_PREFIX_PATH, it will cause problem (and it did cause problem for me).

  • Get the PyTorch source

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
  • Switch to v0.4.1 and initial submodules
git checkout tags/v0.4.1
git checkout -b v0.4.1
git submodule update --init
  • Before we go ahead to start compiling, make sure we have everything correctly:
    • The following are my enviornment variables in ~/.bash_profile. CUDA_HOME and CUDA_NVCC_EXECUTABLE may not be needed. It was there because I tried to compile tensorflow previously. The last PATH (PATH=/usr/local/cuda/bin:$PATH) may be removed as well. But to be safe, you can keep the same as mine.
    export CUDA_HOME=/usr/local/cuda
    export CUDA_NVCC_EXECUTABLE=/usr/local/cuda/bin/nvcc
    export DYLD_LIBRARY_PATH="$CUDA_HOME/extras/CUPTI/lib:/Developer/NVIDIA/CUDA-9.2/lib:$DYLD_LIBRARY_PATH"
    export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH
    export PATH=/Developer/NVIDIA/CUDA-9.2/bin${PATH:+:${PATH}}
    export PATH=/usr/local/cuda/bin:$PATH
    
    • clang version, it shall be something like below after step 2.
    $ clang -v
    Apple LLVM version 9.0.0 (clang-900.0.39.2)
    Target: x86_64-apple-darwin17.7.0
    Thread model: posix
    InstalledDir: /Applications/Xcode_9.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
    Found CUDA installation: /usr/local/cuda, version unknown
    
  • Build and install Pytorch. It will take a while (like 30 minutes to an hour). Just be patient.
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install
  • After it is done, verify it with "> pip list". You should see "torch" with version "0.5.0a0+a24163a". And with eGPU connected, you can also do. Make sure is_available() returns True.
cd <any directory except pytorch!>
python
>>> import torch
>>> torch.cuda.is_available()
True

Step 5: Install fastai

pip install fastai

This will install torchvision-nightly for you, which is needed by fastai.

Double check pip list and make sure Pillow is installed correctly and there is no both "Pillow" and "pillow".

With torchvision-nightly, installed, we can verify that pytorch is installed correctly. Download pytorch examples and compare time required with and without cuda.

git clone https://github.com/pytorch/examples
cd examples/mnist
time python main.py >/dev/null
real 1m38.430s
user 2m6.163s
sys 0m7.762s
time python main.py --no-cuda >/dev/null
real 5m47.750s
user 37m22.609s
sys 1m23.813s

There are couple of packages shall be installed for fastai as well.

pip install bcolz
pip install opencv-python
pip install seaborn
pip install graphviz
pip install sklearn_pandas
pip install isoweek
pip install pandas_summary
pip install torchtext

Maybe there are more, and maybe the best way is to install it from fastai repo. But these works for me.

Let's test.

git clone https://github.com/fastai/fastai
cd fastai
jupyter-notebook

In Jupyter notebook, open courses/dl1/lesson1.ipynb. Run the first few code blocks, especially those imports and see if there is any error. If any error about missing packages, just pip install them.

Then you can go ahead with the lesson, test and enjoy your eGPU!

BTW, You can open "GPU History" from "Activity Monitor" and monitor your eGPU's load while testing.

If this gist helped you, please leave a star ;-) I will be very happy to see that it helped.

Final Note

Be ** VERY VERY CAREFULE ** about installing OSX security patches / updates.

I installed the latest update for High Sierra, which updated the macOS build to 17G3025 (still version 10.13.6). Then macos-egpu doesn't support this new build and it won't recognize the egpu anymore. Luckily I found purge-wrangler which saved my life. And I like the way it is designed and explained.

Every macOS update rewrites kernel extensions (including security updates). This means that all patches installed using purge-wrangler.sh are reset. With V5.0.0 or later, the system will notify you if this has happened, and allow you to re-patch immediately.

I recommend to have a time machine backup before every system updates. Because it seems there is always a gap between the OSX system updates is released and the corresponding nvidia web driver is released. If you apply the system update before the new web driver is available, you will end up with nowhere... If you still want to use the system, you have to either rollback your system with time machine or use Web-Driver-Toolkit as suggested here to patch the NVDAStartup (I just read this post but didn't try it by myself.)

@dandanwei
Copy link
Author

This is great! I think I'm getting close... but, I'm still having problems compiling pytorch... I'm pretty much matching exactly what you did above. 10.13.6, CUDA Driver Version: 410.130, GPU Driver Version: 387.10.10.10.40.105 and NVIDIA GeForce GTX 1080 Ti is showing up in system profiler.

clang is same...
Apple LLVM version 9.0.0 (clang-900.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin
Found CUDA installation: /usr/local/cuda, version unknown

Here is where it fails...
`CMake Warning at cmake/Dependencies.cmake:257 (message):
NUMA is currently only supported under Linux.
Call Stack (most recent call first):
CMakeLists.txt:183 (include)

-- Found system Eigen at /anaconda3/envs/ptc/include/eigen3
-- Could NOT find pybind11 (missing: pybind11_INCLUDE_DIR)
-- Found CUDA: /usr/local/cuda (found suitable version "9.0", minimum required is "7.0")
-- Caffe2: CUDA detected: 9.0
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 9.0
-- Found CUDNN: /usr/local/cuda/include
-- Found cuDNN: v7.1.4 (include: /usr/local/cuda/include, library: /usr/local/cuda/lib/libcudnn.7.dylib)`

Any thoughts?

I did't have this problem. Did you switch to tags/v0.4.1 and download/update the submodule?

cd pytorch
git checkout tags/v0.4.1
git checkout -b v0.4.1
git submodule update --init

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment