This document describe how to start a jupyter session on a SLURM cluster GPU node and, optionnaly, using it from Google Collab.
Using google Collab is optional and can pose serious security risks, please carrefully read the Google local runtime documentation and ask your system administrator for permission before connecting Google Colab to a local server.
Start a tmux session on the login node:
ssh <username>@<slurm-login-node>
tmux new -s notebook
Request a GPU node in a interactive session:
srun -c 4 --gres=gpu:1 --pty bash
nvidia-smi
Install miniconda + fastai env (~10min on slow disks):
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda
eval "$($HOME/miniconda3/bin/conda shell.bash hook)"
conda init bash
conda config --set auto_activate_base false
conda create -n fastai -c fastai python=3.7 fastai jupyter -y
Install jupyter over ws (optional, only for Google Colab comptability):
conda activate fastai
pip install jupyter_http_over_ws
jupyter serverextension enable --py jupyter_http_over_ws
Clone fastai AI:
cd ~
mkdir -p fastai
cd fastai
git clone https://github.com/fastai/course-v3.git
Exit interactive session:
exit
squeue -u <username>
# Check if all interactive sessions are terminated
Request a GPU node in a interactive session:
srun -c 4 --gres=gpu:1 --pty bash
nvidia-smi
Take note of machine hostname
hostname
Activate conda env. and start jupyter notebook:
conda activate fastai
cd course-v3
# When using collab (!warning! could be unsage. Google and notebooks must be trusted.)
jupyter notebook \
--NotebookApp.allow_origin='https://colab.research.google.com' \
--ip 0.0.0.0 \
--port=8888 \
--NotebookApp.port_retries=0
# Without collab (safe if local machine is trusted):
jupyter notebook --port=8888 --ip 0.0.0.0
Copy the displayed connection token
Disconnect tmux session (without stopping the jupyter server) using Ctrl+b, d
and exit ssh:
Open a new terminal, redirect notebook port to localhost
ssh -L 8888:localhost:8888 <username>@<job-hostname>
Open notebook and past connection token (important: this step is required even if collab is used!) http://localhost:8888
Using google Collab is optional and can pose serious security risks, please carrefully read the Google local runtime documentation and ask your system administrator for permission before connecting Google Colab to a local server.
See step 4 in : https://research.google.com/colaboratory/local-runtimes.html
Important: The jupyter server job should be termined when the GPU is not in used for prolonged periods.
Connect to login node, and re-attached tmux session:
ssh <username>@<slurm-login-node>
tmux a -t notebook
Exit jupyter with Ctrl+C
two times, and type exit
to terminate job.
From the loggin node, check if all interactive session are terminated:
squeue -u <username>
To manually kill a job, use:
scancel <job-id>