Workflow: git: pre-commit hook to check style Cleanup: gpt and gpt2 attention vv similar besides caching
- fine to just add caching to GPT
Infra: test coverage for t5 causal mask
Infra: add test to ModelTesterMixin that loss doesn't change if pad tokens are introduced
save_pretrained: should mkdir if save_path doesn't exist
What does
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)
do? It is all over the place. Can it be deleted? Thom said yes on slack.
TranslationPipeline abstractions that works for marian
e.g. you call call
pipeline(task='translation', src_lang='en', tgt_lang='fr')(['I went to the bakery'])
checkin FastAI sortish sampler, add --sortish-sampler clarg for examples
Cleanup: There are many nearly identical Attention implems. Can they be consolidated?
Easy win: go back through a few shleifer cleanup PRs and do the same thing in TF.
Easy win: go through examples tests and switch to tiny models instead of distilbert e.g. sshleifer/tiny-bart-random
Harder: using profiling tools to figure out why training summarization is 10x more memory intensive than inference.
test coverage for CTRL,XLM generation: does use_cache=True change results?
Medium cleanup: * refactor determine_archive_file: very similar logic all over