I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses
This addresses a practical problem for AI researchers and practitioners by revealing an incremental insight into fine-tuning efficiency for LLMs.
The paper investigates why fine-tuning large language models (LLMs) with LLM-generated responses outperforms using human-generated responses in reasoning tasks, finding that LLMs are more 'familiar' with such content, leading to lower perplexity and better performance, with enhanced capabilities maintained across tasks.
This paper explores an intriguing observation: fine-tuning a large language model (LLM) with responses generated by a LLM often yields better results than using responses generated by humans, particularly in reasoning tasks. We conduct an in-depth investigation to understand why this occurs. Contrary to the common belief that these instances is due to the more detailed nature of LLM-generated content, our study identifies another contributing factor: an LLM is inherently more "familiar" with LLM generated responses. This familiarity is evidenced by lower perplexity before fine-tuning. We design a series of experiments to understand the impact of the "familiarity" and our conclusion reveals that this "familiarity" significantly impacts learning performance. Training with LLM-generated responses not only enhances performance but also helps maintain the model's capabilities in other reasoning tasks after fine-tuning on a specific task.