When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models

arXiv:2602.02855v1
Originality Incremental advance
AI Analysis

This work addresses a counterintuitive limitation in fine-tuning for machine learning practitioners, revealing that strong pre-training may hinder rather than help convergence.

The paper demonstrates that excessive pre-training can slow down LoRA fine-tuning optimization, even when tasks are well-aligned, by analyzing convergence rates via single-index models.

Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true: excessive pre-training can computationally slow down fine-tuning optimization. We study this phenomenon for low-rank adaptation (LoRA) fine-tuning on single-index models trained under one-pass SGD. Leveraging a summary statistics description of the fine-tuning dynamics, we precisely characterize how the convergence rate depends on the initial fine-tuning alignment and the degree of non-linearity of the target task. The key take away is that even when the pre-training and down- stream tasks are well aligned, strong pre-training can induce a prolonged search phase and hinder convergence. Our theory thus provides a unified picture of how pre-training strength and task difficulty jointly shape the dynamics and limitations of LoRA fine-tuning in a nontrivial tractable model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes