Steering LLM Reasoning Through Bias-Only Adaptation
This work reduces the cost and complexity of fine-tuning for reasoning tasks, though it is incremental as it builds on existing adapter and RL-tuning methods.
The paper tackled the problem of efficiently fine-tuning large language models for mathematical reasoning by training a single steering vector per layer with reinforcement learning while freezing base weights, achieving accuracy matching fully RL-tuned models with only about 0.0016% additional parameters on an 8 billion-parameter model.
We show that training a single $d$-dimensional steering vector per layer with reinforcement learning, while freezing all base weights, matches the accuracy of fully RL-tuned reasoning models on mathematical-reasoning tasks. On an 8 billion-parameter model this adds only $\approx 0.0016\%$ additional parameters and reproduces performance across a range of base models and mathematical-reasoning benchmarks. These results tighten the upper bound on the parameter budget required for high-level chain-of-thought reasoning, indicating that millions of adapter weights are unnecessary. The minimal trainable footprint reduces optimizer memory and inter-GPU communication, lowering the overall cost of fine-tuning. Moreover, a logit-lens analysis shows that the learned vectors amplify coherent token directions, providing clearer insight into the model's internal computations.