May 6 - June 15, 2021
Once a large pre-trained language model is published, it is a snapshot of language when its corpus was collected. What are ways to update models to support new or newly-frequent terms (BIPOC), phrasing (social distancing), or people and events (Fyre Festival)? What are reliable, low-cost ways to test and benchmark these methods of updating?
Moving to participate in Modeling / Retrieval Working Group; if you have resources about model update-ability, feel free to join that group, contact Nick Doiron on Slack, and/or paste links to papers below.
My goal would be a benchmark to compare approaches to move/insert embeddings (CPU) or do a short burst of training (GPU). Terms would come from news articles, Reddit comments, and/or fictional events where we can show the models have no prior knowledge.
KD: I feel this is a very important task and what most language models are struggling at. There has been some interesting work on dynamical evaluation which attempt to fit models to recent history: https://arxiv.org/pdf/1904.08378.pdf https://www.aclweb.org/anthology/2021.eacl-main.6.pdf
ND: ^^ thanks, these are good resources and also remind me to include FB's dynabench.org in our brainstorming