Not How Many, But Which: Parameter Placement in Low-Rank Adaptation

arXiv:2605.1220788.6

AI Analysis

For practitioners using LoRA with reinforcement learning (GRPO), this work provides a method to select which parameters to train, avoiding performance degradation from random placement.

The paper studies the parameter placement problem in LoRA adapters, finding that under GRPO, random placement fails while gradient-informed placement recovers standard LoRA accuracy, with SFT being more robust. The scoring procedure identifies critical parameters in under 10 seconds at less than 0.5% of training cost.

We study the \textit{parameter placement problem}: given a fixed budget of $k$ trainable entries within the B matrix of a LoRA adapter (A frozen), does the choice of which $k$ matter? Under supervised fine-tuning, random and informed subsets achieve comparable performance. Under GRPO on base models, random placement fails to improve over the base model, while gradient-informed placement recovers standard LoRA accuracy. This regime dependence traces to gradient structure: SFT gradients are low-rank and directionally stable, so any subset accumulates coherent updates; GRPO gradients are high-rank and near-orthogonal across steps, so only elements with consistently signed gradients retain the learning signal. Our scoring procedure identifies these critical parameters in under 10 seconds at less than 0.5% of training cost. Selected parameters concentrate on residual-stream-writing projections (V, O, Down), stable across model families and scales (1.5B - 8B).

View on arXiv PDF

Similar