LG AIMay 5

Rethinking the Rank Threshold for LoRA Fine-Tuning

arXiv:2605.037242.3

Predicted impact top 62% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work provides theoretical and empirical justification for using rank-1 LoRA in binary classification, potentially reducing computational costs for practitioners.

The paper shows that for binary classification in the neural tangent kernel regime, the LoRA rank threshold can be reduced from r≥12 to r=1, with empirical validation on GLUE-style tasks. For multi-class tasks, the optimal rank remains higher.

A recent landscape analysis of LoRA fine-tuning in the neural tangent kernel regime establishes a sufficient condition $r(r+1)/2 > KN$ on the LoRA rank $r$ for the absence of spurious local minima under squared-error loss, prescribing $r \geq 12$ on canonical few-shot RoBERTa setups. The condition is stated for general output dimension $K$, so its sharpness in any particular regime, and its practical implication for the cross-entropy loss actually used in fine-tuning, are open. We give three results that together reduce the prescribed rank to $r = 1$ for binary classification in this regime. First, replacing the symmetric Sard-form count with the non-symmetric LoRA manifold dimension yields a strictly weaker capacity requirement, $r(m+n) - r^2 > C^* \cdot KN$ with $C^* \approx 1.35$ under Gaussian-iid features, satisfied at $r = 1$ on canonical setups. Second, in the cross-entropy setting the Polyak--Łojasiewicz inequality removes the rank threshold entirely. Third, a Rademacher-complexity bound predicts rank-one variance optimality precisely when the bias term is saturated, which is the case for binary classification but not for $K > 2$. Empirically, across four GLUE-style binary tasks, three encoder architectures, and at scale on RoBERTa-large, rank one is competitive with the existing prescription $r = 12$; on multi-class MNLI the optimal rank shifts above one, also as predicted. The binary-regime guarantees are conditional on standard NTK assumptions; the multi-class extension is left to future work.

View on arXiv PDF

Similar