This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The script simulates a potential race condition in SaveModelCallBack, when 'every='improvement' is specified. | |
When launched in PyTorch's DistributedDataParallel (DDP) mode on a single host with multiple GPU: | |
python3 -m torch.distributed.launch --nproc_per_node=3 test_barrier_load.py | |
The master process (Rank 0) would sleep a few seconds before saving the model after the last epoch, in on_epoch_end(). | |
Other process would attempt to load in on_train_end(). Without synchronization the script would crash. | |
When properly synchronized, other processes will wait for the master process arrive at the post-write barrier as well, before proceeding to read the file, as the following run on a single host with 3 GPUs, 3 epochs: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
# Run this script as: | |
# (Yes, even with nproc_per_node=1, it'll trigger the bug) | |
# python -m torch.distributed.launch --nproc_per_node=1 wgan_ddp.py | |
# | |
import argparse | |
from fastai.vision import * | |
from fastai.vision.gan import * | |
from fastai.distributed import * |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
import fastai | |
from fastai.text import * | |
from fastai.distributed import * | |
import torch | |
import argparse, os | |
def train(local_rank:int=None, epochs:int=1): | |
if local_rank is not None: | |
torch.cuda.set_device(local_rank) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This script is adapted/tweaked from the openWRT wiki page on creating VPN server. | |
# VPN client can access outside world as if the traffic originates from the openWRT router. | |
# | |
# Prerequisites | |
# 1. opkg update && opkg install openvpn-openssl openvpn-easy-rsa | |
# 2. Get a public DDNS domain name or a static IP for the vpn server, put it into ddns_name="" near the bottom of the script. | |
# 3. Customize parameters, server/client service name, subnet, server port, output dir etc in the same bottom section. | |
# | |
# USAGE: | |
# 1. sh ./ovpn_owrt.sh <pki directory> [optional dh.pem file] |