LG AIFeb 9

Linearization Explains Fine-Tuning in Large Language Models

Zahra Rahimi Afzal, Tara Esmaeilbeig, Mojtaba Soltanalian, Mesrob I. Ohannessian

arXiv:2602.08239v14.94 citationsh-index: 12

Originality Incremental advance

AI Analysis

This work provides theoretical insights into fine-tuning mechanisms for researchers and practitioners in machine learning, potentially enhancing PEFT techniques, though it is incremental as it builds on existing linearization and NTK concepts.

The paper tackles the problem of understanding the mechanisms behind Parameter-Efficient Fine-Tuning (PEFT) in large language models by analyzing fine-tuning dynamics through linearization, showing that it becomes equivalent to learning with the neural tangent kernel (NTK) and revealing a strong correlation between the NTK's eigenvalue spectrum and adaptation performance, with empirical validation on LoRA.

Parameter-Efficient Fine-Tuning (PEFT) is a popular class of techniques that strive to adapt large models in a scalable and resource-efficient manner. Yet, the mechanisms underlying their training performance and generalization remain underexplored. In this paper, we provide several insights into such fine-tuning through the lens of linearization. Fine-tuned models are often implicitly encouraged to remain close to the pretrained model. By making this explicit, using an Euclidean distance inductive bias in parameter space, we show that fine-tuning dynamics become equivalent to learning with the positive-definite neural tangent kernel (NTK). We specifically analyze how close the fully linear and the linearized fine-tuning optimizations are, based on the strength of the regularization. This allows us to be pragmatic about how good a model linearization is when fine-tuning large language models (LLMs). When linearization is a good model, our findings reveal a strong correlation between the eigenvalue spectrum of the NTK and the performance of model adaptation. Motivated by this, we give spectral perturbation bounds on the NTK induced by the choice of layers selected for fine-tuning. We empirically validate our theory on Low Rank Adaptation (LoRA) on LLMs. These insights not only characterize fine-tuning but also have the potential to enhance PEFT techniques, paving the way to better informed and more nimble adaptation in LLMs.

View on arXiv PDF

Similar