Skip to content

Instantly share code, notes, and snippets.

@TengdaHan
TengdaHan / ddp_notes.md
Last active April 21, 2025 08:06
Multi-node-training on slurm with PyTorch

Multi-node-training on slurm with PyTorch

What's this?

  • A simple note for how to start multi-node-training on slurm scheduler with PyTorch.
  • Useful especially when scheduler is too busy that you cannot get multiple GPUs allocated, or you need more than 4 GPUs for a single job.
  • Requirement: Have to use PyTorch DistributedDataParallel(DDP) for this purpose.
  • Warning: might need to re-factor your own code.
  • Warning: might be secretly condemned by your colleagues because using too many GPUs.
@rxwei
rxwei / ad-manifesto.md
Last active December 6, 2024 16:54
First-Class Automatic Differentiation in Swift: A Manifesto
@repeatedly
repeatedly / fizzbuzz.d
Created August 9, 2012 07:41
FizzBuzz based on tanakh's Haskell
import std.algorithm, std.conv, std.range, std.stdio;
// FizzBuzz from http://ideone.com/ciKtm
// I cannot port 'f <> b <|> n'
void main()
{
auto fizz = cycle([null, null, "Fizz"]);
auto buzz = cycle([null, null, null, null, "Buzz"]);
auto nums = map!(to!string)(iota(1, 101));