m0o0scar · August 15, 2024 02:48
diff --git a/📖 Amuro & Char! Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Languag.md b/📖 Amuro & Char! Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Languag.md
diff --git a/🪣 content.md b/🪣 content.md
diff --git a/🪣 info.json b/🪣 info.json
 {
  "url": "https://huggingface.co/papers/2408.06663",
  "type": "arxiv",
  "title": "Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models",
  "subtitle": "Authors: Kaiser Sun, Mark Dredze",
  "description": "Abstract: The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or downstream tasks. In this work, we investigate the relationship between pre-training and fine-tuning by fine-tuning multiple intermediate pre-trained model checkpoints. Our results on 18 datasets suggest that i) continual pre-training improves the model in a latent way that unveils after fine-tuning; ii) with extra fine-tuning, the datasets that the model does not demonstrate capability gain much more than those that the model performs well during the pre-training stage; iii) although model benefits significantly through supervised fine-tuning, it may forget previously known domain knowledge and the tasks that are not seen during fine-tuning; iv) the model resembles high sensitivity to evaluation prompts after supervised fine-tuning, but this sensitivity can be alleviated by more pre-training."
 }
	{
	"url": "https://huggingface.co/papers/2408.06663",
	"type": "arxiv",
	"title": "Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models",
	"subtitle": "Authors: Kaiser Sun, Mark Dredze",
	"description": "Abstract: The development of large language models leads to the formation of a pre-train-then-align paradigm, in which the model is typically pre-trained on a large text corpus and undergoes a tuning stage to align the model with human preference or downstream tasks. In this work, we investigate the relationship between pre-training and fine-tuning by fine-tuning multiple intermediate pre-trained model checkpoints. Our results on 18 datasets suggest that i) continual pre-training improves the model in a latent way that unveils after fine-tuning; ii) with extra fine-tuning, the datasets that the model does not demonstrate capability gain much more than those that the model performs well during the pre-training stage; iii) although model benefits significantly through supervised fine-tuning, it may forget previously known domain knowledge and the tasks that are not seen during fine-tuning; iv) the model resembles high sensitivity to evaluation prompts after supervised fine-tuning, but this sensitivity can be alleviated by more pre-training."
	}