You ssh into the systems using:
ssh <username>@<login node>
E.g., to login to Niagra for user example
:
It is convenient to set up keys for password-less entry. On the local machine:
$ ssh-keygen
# for <username>@<host>
$ ssh-copy-id [email protected]
You only need to do this once.
- Intro to linux
- What is a scheduler
- Webinars for using all Compute Canada systems
- Setting up virtual environments for Python
- Installing Tensorflow
- Installing PyTorch
- Running Jobs
- Using GPUs
- Technical Support
- Login node:
cedar.computecanada.ca
- Quickstart
- Login node:
graham.computecanada.ca
- Quickstart
- Login node:
niagara.scinet.utoronto.ca
- Quickstart
- Login node:
beluga.computecanada.ca
- Quickstart
The following commands will set up a python virtualenv
, and install python modules in it. Note that you should create virtual environment in your $HOME
directory
# First, login to the host machine (ssh <username>@<login node>)
$ ssh [email protected]
# Load the version of python you want. You can check which version are available with module avail python
~ $ module load python/3.7.0
# Create the virtual environment at path ENV
~ $ ENV=path/to/env/to/create
~ $ virtualenv --no-download --python=python3.7 $ENV
# Activate it
~ $ source $ENV/bin/activate
# Check that pip is up to date
(ENV) ~ $ pip install --upgrade pip
# Next, install any requirements with pip install, e.g.,
(ENV) ~ $ pip install numpy --no-index
Whenever possible, you should install the python wheel for your package provided by Compute Candada with pip install package_name --no-index
. See here for a list of available wheels. If you don't see an available wheel for your package, you can send a request for it to be added to [email protected].
In your job scripts (e.g., train.sh
) make sure you call model load python/<VERSION>
and source <ENV>/bin/activate
. Example:
#!/bin/bash
# The following three commands allow us to take advantage of whole-node
# scheduling
#SBATCH --nodes=1
#SBATCH --cpus-per-task=80
#SBATCH --mem=0
# Wall time
#SBATCH --time=12:00:00
#SBATCH --job-name=example
#SBATCH --output=$SCRATCH/output/example_jobid_%j.txt
# Emails me when job starts, ends or fails
#SBATCH [email protected]
#SBATCH --mail-type=ALL
ENV=path/to/my/env
# load any required modules
module load python/3.7.0
# activate the virtual environment
source $ENV/bin/activate
# run a training session
srun python example.py
Jobs are submittied with sbatch <jobscript.sh>
, e.g., sbatch train.sh
Before submitting a job to the queue, its useful to test that your submission script works. This can be done by initializing an interactive job.
For example, to initialize an interactive job with 48 cpus for 30 minutes:
salloc --time=00:30:00 --nodes=1 --mem=0 --cpus-per-task=48
In general, the arguments to salloc
are the same as those you supply in your job scripts.
Alternatively, on Niagra, you can use the command
debugjob 1
To request an interactive job to be run on 1 node.
To check the status of all your submitted jobs:
squeue -u $USER
To kill a job:
scancel -i JOBID
Useful command for monitoring CPU and GPU usage of a job:
srun --jobid JOBID --pty tmux new-session -d 'htop -u $USER' \; split-window -h 'watch nvidia-smi' \; attach
The pip argument has a typo.
--no_index
->--no-index
https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-no-index