LGAICLJul 15, 2024

Learning Dynamics of LLM Finetuning

arXiv:2407.10490v493 citationsh-index: 10
Originality Incremental advance
AI Analysis

This work addresses the challenge of interpreting and optimizing LLM finetuning for researchers and practitioners, though it is incremental as it builds on existing learning dynamics concepts.

The paper tackles the problem of understanding how large language models (LLMs) behave during finetuning by analyzing learning dynamics, specifically how training examples influence predictions on others, and proposes a framework that explains phenomena like hallucination strengthening and a 'squeezing effect' in DPO, leading to an effective method to improve alignment performance.

Learning dynamics, which describes how the learning of specific training examples influences the model's predictions on other examples, gives us a powerful tool for understanding the behavior of deep learning systems. We study the learning dynamics of large language models during different types of finetuning, by analyzing the step-wise decomposition of how influence accumulates among different potential responses. Our framework allows a uniform interpretation of many interesting observations about the training of popular algorithms for both instruction tuning and preference tuning. In particular, we propose a hypothetical explanation of why specific types of hallucination are strengthened after finetuning, e.g., the model might use phrases or facts in the response for question B to answer question A, or the model might keep repeating similar simple phrases when generating responses. We also extend our framework and highlight a unique "squeezing effect" to explain a previously observed phenomenon in off-policy direct preference optimization (DPO), where running DPO for too long makes even the desired outputs less likely. This framework also provides insights into where the benefits of on-policy DPO and other variants come from. The analysis not only provides a novel perspective of understanding LLM's finetuning but also inspires a simple, effective method to improve alignment performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes