- A simple note for how to start multi-node-training on slurm scheduler with PyTorch.
- Useful especially when scheduler is too busy that you cannot get multiple GPUs allocated, or you need more than 4 GPUs for a single job.
- Requirement: Have to use PyTorch DistributedDataParallel(DDP) for this purpose.
- Warning: might need to re-factor your own code.
- Warning: might be secretly condemned by your colleagues because using too many GPUs.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from torch import FloatTensor, LongTensor, Tensor, Size, lerp, zeros_like | |
from torch.linalg import norm | |
# adapted to PyTorch from: | |
# https://gist.github.com/dvschultz/3af50c40df002da3b751efab1daddf2c | |
# most of the extra complexity is to support: | |
# - many-dimensional vectors | |
# - v0 or v1 with last dim all zeroes, or v0 ~colinear with v1 | |
# - falls back to lerp() | |
# - conditional logic implemented with parallelism rather than Python loops |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Iee9keaYk+mfs+S5kAp8fG11c2ljLjE2My5jb20KQEAqLm11c2ljLjEyNi5uZXQKCiFRUemfs+S5kAp8 | |
fHkucXEuY29tXgp8fGkueS5xcS5jb20vdjgvcGxheXNvbmcuaHRtbAp8fGMueS5xcS5jb20vdjgvZmNn | |
LWJpbi9mY2dfcGxheV9zaW5nbGVfc29uZy5mY2cKQEBkbC5zdHJlYW0ucXFtdXNpYy5xcS5jb20KCiHp | |
hbfni5fpn7PkuZAKfHxrdWdvdS5jb21eCnx8aXAua3Vnb3UuY29tL2NoZWNrL2lzY24KQEBmcy5vcGVu | |
Lmt1Z291LmNvbQoKIemFt+aIkemfs+S5kAp8fGt1d28uY25eCnx8aXBjaGVjay5rdXdvLmNuL2lwX2No | |
ZWNrLmt1d28KQEBzeWNkbi5rdXdvLmNuXgoKIeeZvuW6pumfs+S5kAp8fG11c2ljLmJhaWR1LmNvbS9k | |
YXRhL3VzZXIvbG9jYXRpb24KQEB5aW55dWVzaGl0aW5nLmJhaWR1LmNvbQo= |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
'''This script goes along the blog post | |
"Building powerful image classification models using very little data" | |
from blog.keras.io. | |
It uses data that can be downloaded at: | |
https://www.kaggle.com/c/dogs-vs-cats/data | |
In our setup, we: | |
- created a data/ folder | |
- created train/ and validation/ subfolders inside data/ | |
- created cats/ and dogs/ subfolders inside train/ and validation/ | |
- put the cat pictures index 0-999 in data/train/cats |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from keras.models import Sequential | |
from keras.layers import Convolution2D, ZeroPadding2D, MaxPooling2D | |
img_width, img_height = 128, 128 | |
# this will contain our generated images | |
input_img = K.placeholder((1, 3, img_width, img_height)) | |
# build the VGG16 network with our input_img as input | |
first_layer = ZeroPadding2D((1, 1), input_shape=(3, img_width, img_height)) |
##VGG16 model for Keras
This is the Keras model of the 16-layer network used by the VGG team in the ILSVRC-2014 competition.
It has been obtained by directly converting the Caffe model provived by the authors.
Details about the network architecture can be found in the following arXiv paper:
Very Deep Convolutional Networks for Large-Scale Image Recognition
K. Simonyan, A. Zisserman
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
name: "nin_imagenet" | |
input: "data" | |
input_shape { | |
dim: 10 | |
dim: 3 | |
dim: 224 | |
dim: 224 | |
} | |
layers { | |
bottom: "data" |
NewerOlder