High-Dimensional Theory of LoRA Fine-Tuning in a Solvable Attention Model

arXiv:2606.0589946.6
AI Analysis

Provides a theoretical framework for understanding LoRA fine-tuning in attention models, offering insights for practitioners on optimal pre-training and active fine-tuning.

The paper develops a high-dimensional statistical theory for LoRA fine-tuning in attention models, providing sharp asymptotic predictions for test errors and representation alignment. It shows that pre-training effects reduce to an effective noise term and identifies regimes where test error and representation quality diverge.

We develop a high-dimensional statistical theory of low-rank adaptation (LoRA) in attention models, capturing the interplay between pre-training and fine-tuning. We introduce a solvable framework in which a single-head attention layer is first pre-trained on a data-abundant task and subsequently adapted via a rank-one LoRA update on limited data. In the high-dimensional limit, both stages admit a sharp asymptotic characterization in terms of a finite set of order parameters, yielding explicit predictions for test errors and representation alignment. Our analysis shows that the impact of pre-training on LoRA is summarized by an effective noise term, from which we derive prescriptions for the optimal pre-training procedure. We also demonstrate a regime with a mismatch between the value of the test error and representation quality, and propose an application of our theory to active fine-tuning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes