Skip to content

Instantly share code, notes, and snippets.

View sshleifer's full-sized avatar
🏠
Working from home

Sam Shleifer sshleifer

🏠
Working from home
View GitHub Profile
model batch_size sequence_length MB
t5-large 1 128 6558
t5-large 1 512 9568
t5-large 1 1024 23124
facebook/bart-large 1 128 3758
facebook/bart-large 1 512 4670
facebook/bart-large 1 1024 8888
t5-base 1 128 2242
t5-base 1 512 3776
t5-base 1 1024 9056
@sshleifer
sshleifer / misc_ideas.md
Last active June 8, 2020 20:00
Misc Transformers ideas

Workflow: git: pre-commit hook to check style Cleanup: gpt and gpt2 attention vv similar besides caching

  • fine to just add caching to GPT

Infra: test coverage for t5 causal mask

Infra: add test to ModelTesterMixin that loss doesn't change if pad tokens are introduced

save_pretrained: should mkdir if save_path doesn't exist

@sshleifer
sshleifer / copy_subset.py
Last active June 5, 2020 15:41
Copy a subset of bart layers into a smaller model
from transformers import BartConfig, BartForConditionalGeneration, BartTokenizer
from torch import nn
from typing import List
layers_to_copy = { # maps # layers in student -> which teacher layers to copy
6: [0, 2, 4, 7, 9, 11],
1: [11],
3: [0, 6, 11],
2: [0, 11],
4: [0, 4, 8, 11],
9: [0, 1, 2, 4, 5, 7, 9, 10, 11],
@sshleifer
sshleifer / s3_suggestions.md
Created May 19, 2020 13:41
S3 Suggestions
  • everything must go under bert besides datasets
  • Put random models under your own namespace, like sshleifer/bart-tiny-random
  • use the --dry-run command line arg
  • [FIXME] You can login to the portal and do things manually at this URLwith your kibana creds (??)
@sshleifer
sshleifer / PreTweet Checklist
Created May 15, 2020 16:02
Before you tweet checklist
- for mention in tweet.grep('@'): assert twitter.get(mention) == expected_person
- assert photo has tags
- if thread: numbers make sense or down emoji
- all links work
- read it over once more
@sshleifer
sshleifer / tweet.md
Last active July 15, 2020 13:36
Tweet Translations using Marian MTModels

See Model List, Docs

en_tweet = ["Today we are releasing 1,008 machine translation models, covering combinations of 140 different languages.",
            "They were trained by @jorgtiedemann with @marian, and converted by @sam_shleifer.", 
            "Find your language here"]

en-da: I dag frigiver vi 1.008 maskinoversættelsesmodeller, der dækker kombinationer af 140 forskellige sprog. De blev uddannet af @jorgtiedemann med @marian, og konverteret af @sam_shleifer. Find dit sprog her:

@sshleifer
sshleifer / multilingual_groups.py
Last active May 29, 2020 06:33
Multilingual Group Members Mapping
GROUP_MEMBERS = {
'ZH': ['cmn', 'cn', 'yue', 'ze_zh', 'zh_cn', 'zh_CN', 'zh_HK', 'zh_tw', 'zh_TW', 'zh_yue', 'zhs', 'zht', 'zh'],
'ROMANCE': ['fr', 'fr_BE', 'fr_CA', 'fr_FR', 'wa', 'frp', 'oc', 'ca', 'rm', 'lld', 'fur', 'lij', 'lmo',
'es', 'es_AR', 'es_CL', 'es_CO', 'es_CR', 'es_DO', 'es_EC', 'es_ES', 'es_GT', 'es_HN', 'es_MX', 'es_NI', 'es_PA', 'es_PE', 'es_PR', 'es_SV', 'es_UY', 'es_VE',
'pt', 'pt_br', 'pt_BR', 'pt_PT', 'gl', 'lad', 'an', 'mwl', 'it', 'it_IT', 'co', 'nap', 'scn', 'vec', 'sc', 'ro', 'la'],
'NORTH_EU': ['de', 'nl', 'fy', 'af', 'da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
'SCANDINAVIA': ['da', 'fo', 'is', 'no', 'nb', 'nn', 'sv'],
'SAMI': ['se', 'sma', 'smj', 'smn', 'sms'],
'NORWAY': ['nb_NO', 'nb', 'nn_NO', 'nn', 'nog', 'no_nb', 'no'],
'CELTIC': ['ga', 'cy', 'br', 'gd', 'kw', 'gv']

MarianMTModel Best Practices:

  • split source language documents into sentences before passing them through the model.

Model Naming

model names use the following format: Helsinki-NLP/opus-mt-{src}-{tgt} fully captivalized values for src and tgt reference group names in the following lookup:

@sshleifer
sshleifer / mismatch.sh
Created May 5, 2020 17:04
Fr-en file on web different than replication
MD=$HOME/transformers_fork/marian_ckpt/fr-en/
vpath=$MD/opus.spm32k-spm32k.vocab.yml
build/marian-decoder \
-m $MD/opus.spm32k-spm32k.transformer-align.model1.npz.best-perplexity.npz \
-v $vpath $vpath <<< "▁Le ▁distributeur ▁de ▁billets ▁est ▁en ▁panne ▁."
EN_DE_CONFIG = {
"bert-train-type-embeddings": "true",
"bert-type-vocab-size": "2",
"dec-cell": "gru",
"dec-cell-base-depth": "2",
"dec-cell-high-depth": "1",
"dec-depth": 6,
"dim-emb": "512",
"dim-rnn": "1024", #IGNORE