- A simple note for how to start multi-node-training on slurm scheduler with PyTorch.
- Useful especially when scheduler is too busy that you cannot get multiple GPUs allocated, or you need more than 4 GPUs for a single job.
- Requirement: Have to use PyTorch DistributedDataParallel(DDP) for this purpose.
- Warning: might need to re-factor your own code.
- Warning: might be secretly condemned by your colleagues because using too many GPUs.
install packages first:
yay -S xl2tpd strongswan networkmanager-l2tp
ref: https://wiki.archlinux.org/index.php/Openswan_L2TP/IPsec_VPN_client_setup
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from __future__ import print_function | |
import torch | |
import torch.nn as nn | |
import torch.nn.functional as F | |
from torch.autograd import Variable | |
def sample_gumbel(shape, eps=1e-20): | |
U = torch.rand(shape).cuda() | |
return -Variable(torch.log(-torch.log(U + eps) + eps)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import matplotlib | |
import matplotlib.cm | |
import tensorflow as tf | |
def colorize(value, vmin=None, vmax=None, cmap=None): | |
""" | |
A utility function for TensorFlow that maps a grayscale image to a matplotlib | |
colormap for use with TensorBoard image summaries. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
"""Simple example on how to log scalars and images to tensorboard without tensor ops. | |
License: BSD License 2.0 | |
""" | |
__author__ = "Michael Gygli" | |
import tensorflow as tf | |
from StringIO import StringIO | |
import matplotlib.pyplot as plt | |
import numpy as np |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{0: 'tench, Tinca tinca', | |
1: 'goldfish, Carassius auratus', | |
2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias', | |
3: 'tiger shark, Galeocerdo cuvieri', | |
4: 'hammerhead, hammerhead shark', | |
5: 'electric ray, crampfish, numbfish, torpedo', | |
6: 'stingray', | |
7: 'cock', | |
8: 'hen', | |
9: 'ostrich, Struthio camelus', |
NewerOlder