I. Clean Python setup from scratch. (~1h) Skip if you already have a python environment setup or want to use your own python virtualenv setup
sudo apt-get install python3-pip python-dev
sudo apt-get update;
sudo apt-get install make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev libffi-dev liblzma-dev
pip install --upgrade pip
curl https://pyenv.run | bash
Add it to ~/.bashrc
#pyenv
export PYENV_ROOT="$HOME/.pyenv"
export PATH="/home/$USER/.pyenv/bin:$PATH"
eval "$(pyenv init -)"
eval "$(pyenv virtualenv-init -)"
And add this to ~/.profile
export PATH="$PYENV_ROOT/bin:$PATH"
eval "$(pyenv init --path)"
export PYENV_ROOT="$HOME/.pyenv"
Reload session:
source .bashrc # reload bash session
Install latest python and make it default:
pyenv install 3.9.5
pyenv global 3.9.5
python -V && which python
should return:
Python 3.9.5
/home/$USER/.pyenv/shims/python
2. pipx: Install and Run Python Applications in Isolated Environments without ruining your global environment
python -m pip install --user pipx
If pipx is not found (not in $PATH) then run:
python -m pipx ensurepath
Now use pipx instead of pip to install/run python standalone apps/git repos (!= python package)
-> Avoid installing package globally... High chance of breaking everything on updates/install
pipx install notebook
pipx install jupyter --include-deps
pipx install jupyterlab
To make your future pyenv-virtualenv available with jupyter, use pyenv-jupyter-kernel plugin:
git clone https://github.com/aiguofer/pyenv-jupyter-kernel $(pyenv root)/plugins/pyenv-jupyter-kernel
pipx install poetry
Verify install:
pipx list
which jupyter-lab
II. Install CUDA, NVIDIA drivers, libcudnn (/!\ Updated installation instructions are always at https://www.tensorflow.org/install/gpu )
-
OFFICIALLY TESTED AND COMPATIBLE GPU CONFIGURATIONS FOR EACH TENSORFLOW AND CUDA/CUDNN CAN BE FOUND AT THIS TABLE. PLEASE, adapt following instructions w.r.t. this table as it contains latest working configurations
-
Check nvidia driver installation (>450.80.02 or your current version)
nvidia-smi
should print GPU info (Printed CUDA version is not accurate)
- Check CUDA install:
nvcc -V
should print:
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
or log about your current cuda version
NOW: Install EACH individual (eventual) missing packages from this step 0. skip otherwise
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin &&
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 &&
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub &&
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" ;
sudo apt-get update;
If you notice problems with GPG keys when running above commands, try this: (from https://github.com/NVIDIA/nvidia-docker/issues/1632#issuecomment-1112770026 and https://github.com/NVIDIA/nvidia-docker/issues/1632#issuecomment-1125739652)
sudo apt-key del 7fa2af80
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/7fa2af80.pub
sudo add-apt-repository "deb http://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" ;
sudo apt-get update;
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb ;
sudo dpkg -i nvidia-machine-learning-repo-ubuntu2004_1.0.0-1_amd64.deb ;
sudo apt-get update;
Note: Latest links/packages can be found in the official NVIDIA repos using ctrl+F at Cuda Ubuntu 20.04 repos and Nvidia ML repo Ubuntu 20.04
sudo ubuntu-drivers devices
should return a list of compatible/recommended drivers (e.g. driver : nvidia-driver-510 - third-party free recommended)
- If the driver version is associated/ends with
-open
then DO NOT install it. (Some issues to match cuda version dependencies?) e.g.,nvidia-driver-525-open
, just pick another driver versionXXX
where there is no nvidia-driver-XXX-open
listed. - Else: Pick the version with the recommended version.
-
sudo apt-get install nvidia-driver-{#RECOMMENDED-VERSION-NUMBER}
- If you encounter package issues/conflicts then try to resolve them with
aptitude
instead ofapt-get
:sudo apt-get install aptitude
sudo aptitude install -f nvidia-driver-{#RECOMMENDED-VERSION-NUMBER}
- Try to figure which solution would resolve the conflicts/dependencies (could be old driver versions, previous cuda install, ...)
sudo apt-get install nvidia-driver-{#RECOMMENDED-VERSION-NUMBER}
- If you encounter package issues/conflicts then try to resolve them with
-
sudo reboot
Continue if nvidia-smi
returns a valid output
To get the latest/appropriate cuda version, you may find the .deb package files at Cuda Ubuntu 20.04 repos and look for ctrl+f libcudnn8_*.deb and libcudnn8-dev.deb* and then download these two .deb files by copying the URLs and install them.
sudo apt-get install --no-install-recommends \
cuda-11-2;
sudo apt-get autoremove \
cd ~/Downloads &&
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcudnn8_8.1.0.77-1+cuda11.2_amd64.deb ;
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libcudnn8-dev_8.1.0.77-1+cuda11.2_amd64.deb ;
sudo dpkg -i libcudnn8_8.1.0.77-1+cuda11.2_amd64.deb ;
sudo dpkg -i libcudnn8-dev_8.1.0.77-1+cuda11.2_amd64.deb
Note: If apt-get install cuda-11-2
fails then try either:
- sudo aptitude install cuda-11-2
then try to solve dependency issues.
- sudo apt-get cuda-toolkit-11-2
which installs cuda in: /usr/local/cuda/bin/
then install the 2 other required packages:
Add this to your ~/.bashrc
: From docs.nvidia.com
# NVIDIA CUDA 11.x
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11/lib64
export CUDA_HOME=/usr/local/cuda-11/
export PATH="/usr/local/cuda-11/bin:$PATH"
source .bashrc # Reload session
Continue if nvcc -V
returns a valid output.
sudo reboot
Source: https://chrisalbon.com/code/deep_learning/setup/prevent_nvidia_drivers_from_upgrading/
Since TensorFlow/Pytorch must match one specific version of CUDA (e.g. 11.0 != 11.1), we must freeze cuda update using apt:
sudo apt-mark hold libcudnn8 libcudnn8-dev # Prevent package updates / freeze versions
dpkg-query -W --showformat='${Package} ${Status}\n' | grep -v deinstall | awk '{ print $1 }' | \
grep -E 'nvidia.*-[0-9]+$' | \
xargs -r -L 1 sudo apt-mark hold
To unfreeze:
sudo apt-mark unhold <package-name>
/!\ Please don't install tensorflow globally with pip/pipx...
If you use pyenv & jupyter and already have created virtualenv, you can register all of your pyenv-virtualenv in jupyter with:
pyenv versions --bare | grep -v "/" | xargs -L 1 pyenv register-kernel
Create a virtualenv from version 3.9.5:
pyenv virtualenv 3.9.5 mygputest
or pyenv virtualenv mygputest
if 3.9.5 is python global version
pyenv virtualenvs # list all virtualenvs
pyenv activate mygputest
Deactivating:
pyenv deactivate
With your virtualenv activated:
python -m pip install tensorflow
Should be 2.5.X or current
3. Install PyTorch 1.10.2 & PyTorch Lightning & Lightning Flash . /!\ Latest installation instructions are always at https://pytorch.org/get-started/locally/ and pytorch repo list is at https://download.pytorch.org/whl/torch_stable.html
python -m pip install torch==1.10.2+cu111 torchaudio==0.10.2+cu111 torchvision==0.11.3+cu111 -f https://download.pytorch.org/whl/torch_stable.html
python -m pip install pytorch-lightning lightning-flash
-
Common:
python -m pip install scikit-learn pandas matplotlib seaborn bokeh
python -m pip install botorch # bayesian optimization on pytorch
python -m pip install opencv-python
-
Audio:
- pyaudio:
sudo apt-get install libjack-jackd2-dev portaudio19-dev
thenpython -m pip install pyaudio
- pyaudio:
-
Meta-opt:
python -m pip install keras-tuner
You may keep this running in a side terminal
watch -d -n 2 nvidia-smi # GPU usage cuda nvidia task manager taskmgr memory
With your virtualenv activated:
python -c "import tensorflow as tf;print(tf.__version__); print(tf.config.list_physical_devices('GPU'))"
Should return:
- current tensorflow version
- last line should be:
'[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]'
With your virtualenv activated:
python -c 'import torch; print(torch.rand(2,3).cuda())'
It should return a random tensor with device cuda:0 such as:
tensor([[0.2551, 0.1373, 0.3072],[0.9524, 0.2616, 0.5635]], device='cuda:0')
Official TensorFlow Keras MNIST tutorial | Official TensorFlow advanced MNIST tutorial
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import datasets, layers, models
import numpy as np
# prepare data
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train, x_test = x_train / 256.0, x_test / 256.0
x_train = np.expand_dims(x_train, -1)
x_test = np.expand_dims(x_test, -1)
# create model
model = keras.Sequential(
[
keras.Input(shape=(28, 28, 1)),
layers.Conv2D(32, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Conv2D(64, kernel_size=(3, 3), activation="relu"),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(10, activation="softmax"),
]
)
# train and test model
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.fit(x_train, y_train, epochs=10, validation_data=(x_test, y_test))
model.evaluate(x_test, y_test)
You should expect ~98% in test accuracy.
PyTorch 1.X.X & PyTorch Lightning & Lightning-Flash
import flash
from torch import nn, optim
from torch.utils.data import DataLoader, random_split, Subset
from torchvision import transforms, datasets
# model
model = nn.Sequential(
nn.Conv2d(1, 32, kernel_size=3, stride=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(32, 64, kernel_size=3, stride=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Dropout(0.5),
nn.Linear(5 * 5 * 64, 10)
)
# data
#dataset = datasets.MNIST('./data_folder', download=True, transform=transforms.ToTensor())
tr = datasets.MNIST('./data_folder', train=True, download=True, transform=transforms.ToTensor())
te = datasets.MNIST('./data_folder', train=False, transform=transforms.ToTensor())
part_tr = random_split(tr, [1875, len(tr)-1875])[0]
part_te = random_split(te, [313, len(te)-313])[0]
# task
classifier = flash.Task(model, loss_fn=nn.functional.cross_entropy, optimizer=optim.Adam)
# train
flash.Trainer(max_epochs=10, accelerator='gpu', devices=1).fit(classifier, DataLoader(part_tr, num_workers=32), DataLoader(part_te, num_workers=32))
Suppose you will run jupyter in port 8888 (server) and forward it to your own (local) port 8888 (Reference command is: ssh -L $client_port:localhost:$server_port login@remote_server
)
-
Connect to your server via ssh:
ssh -L 8888:localhost:8888 your_login@remote_server
-
Start the jupyter server on remote server:
jupyter-lab # (by default on port 8888)
-
Then just copy paste the prompted url in your local browser (e.g.: http://localhost:8888/?token=2b58c8deb1cb467c6b0491504c0e0a1593cd7923af077606).
-
Finally, in the jupyter lab browser window, create a new notebook with your selected virtualenv kernel.
For SSH Tunneling with Putty, you can find quick instructions here
Source: DigitalOcean