gshard optimizer expeiment cmds

Setup

git clone [email protected]:fairinternal/fairseq-py.git && cd fairseq-py && git checkout stable-emb
if you don't have the fairseq conda env, follow these instructions
pip install numpy==1.20. (optional, but some people needed this)
pip install fairscale (should be > 0.3.7, as of writing)
on FAIR cluster: pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda110 -U)
OR on AWS: pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda111 -U)

Common Logic for all commands

Edit this as needed

launch_optim_experiment () {
    ./fb_sweep/benchmark_lm.py \
      -g 8 -t 1 -n 1 \
      --ddp no_c10d \
      --dl 12 \
      --embed-dim 2048 \
      --bs 4 --li 50 \
      --epg 0 \
      --checkpoints-dir . \
      --constraint volta32gb \
      --partition learnfair \
      --resume-failed  --no-save  --mu 7200 "$@"
}

Commands for different optimizers

export NCCL_DEBUG="WARN"
launch_optim_experiment -p opt_exp --opt adam16
launch_optim_experiment -p opt_exp --opt adam
launch_optim_experiment -p opt_exp --opt adam8bit --stable
launch_optim_experiment -p opt_exp.no_mo --opt adafactor
launch_optim_experiment -p opt_exp.yes_mo --opt adafactor --adafactor-use-momentum

Note, for some hparams, you must manually edit fb_sweep/benchmark_lm.py.

sshleifer/optim_cmds.md

Setup

Common Logic for all commands

Commands for different optimizers

sshleifer commented Jul 22, 2021

Uh oh!