Sam Shleifer sshleifer

git clone [email protected]:fairinternal/fairseq-py.git && cd fairseq-py && git checkout stable-emb
if you don't have the fairseq conda env, follow these instructions
pip install numpy==1.20. (optional, but some people needed this)
pip install fairscale (should be > 0.3.7, as of writing)
on FAIR cluster: pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda110 -U)
OR on AWS: pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda111 -U)

Edit this as needed

cat all your raw text into one huge file in /scratch/
run your favorite bpe on that file (20mins for 160GB with 20 workers), writing the result to /scratch.

Then we do some filtering of newlines

grep -A1 . /scratch/rc_train_big.bpe | grep -v "^--$" > /scratch/rc.filtered.train.bpe

Plugins:

Avoid:

6 groups of models inherit from BartForConditionalGeneration. The major differences between them are:

This document focuses on layernorm timing.

In WMT datasets, there is wide variation in the length of examples. Some are one sentence. Some are 10 sentences.
The max batch size that can fit on a v100 is roughly (4, 512)
you end up with lots of batches of shape (4, 12) or (4, small_int) which don't fully utilize the GPU.

Dynamic Batch Size: try to organize batches to be 4*512=2048 tokens, one batch might be shaped (4,512) another (32, 64).

	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	import time
	from tqdm import tqdm
	from pathlib import Path
	import pandas as pd


	models = ['sshleifer/distilbart-cnn-12-3',
	'sshleifer/distilbart-cnn-12-6',
	'sshleifer/distilbart-cnn-6-6',

	from pathlib import Path

	import fire
	from tqdm import tqdm


	DS_TO_KEY = {
	'gigaword': ('document', 'summary'),
	'xsum': ('document', 'summary'),
	'aeslc': ('email_body', 'subject_line'),

	from pathlib import Path

	import fire
	from tqdm import tqdm


	DS_TO_KEY = {
	'gigaword': ('document', 'summary'),
	'xsum': ('document', 'summary'),
	'aeslc': ('email_body', 'subject_line'),

	export b="s3://models.huggingface.co/bert"
	stas_to_fb () {
	src=$1
	shift
	aws s3 sync $b/stas/$src $b/facebook/$src $@
	}

	stas_to_allenai () {
	src=$1
	shift