Patching Models with New Words, People, and Events

May 6 - June 15, 2021

Scope

Once a large pre-trained language model is published, it is a snapshot of language when its corpus was collected. What are ways to update models to support new or newly-frequent terms (BIPOC), phrasing (social distancing), or people and events (Fyre Festival)? What are reliable, low-cost ways to test and benchmark these methods of updating?

Current status

Moving to participate in Modeling / Retrieval Working Group; if you have resources about model update-ability, feel free to join that group, contact Nick Doiron on Slack, and/or paste links to papers below.

Resources

My goal would be a benchmark to compare approaches to move/insert embeddings (CPU) or do a short burst of training (GPU). Terms would come from news articles, Reddit comments, and/or fictional events where we can show the models have no prior knowledge.

KD: I feel this is a very important task and what most language models are struggling at. There has been some interesting work on dynamical evaluation which attempt to fit models to recent history: https://arxiv.org/pdf/1904.08378.pdf https://www.aclweb.org/anthology/2021.eacl-main.6.pdf

ND: ^^ thanks, these are good resources and also remind me to include FB's dynabench.org in our brainstorming

mapmeld/patching_models_bigsci_proposal.md

Patching Models with New Words, People, and Events

Scope

Current status

Resources