LGFeb 4

Learning to Reason in 13 Parameters

John X. Morris, Niloofar Mireshghallah, Mark Ibrahim, Saeed Mahloujifar

arXiv:2602.04118v19.08 citationsh-index: 16

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient parameter training for reasoning tasks in AI, though it is incremental as it builds on existing low-rank adapter methods.

The authors tackled the problem of scaling low-rank adapters for reasoning in language models, achieving 91% accuracy on GSM8K with only 13 trained parameters and recovering 90% of performance improvements with 1000x fewer parameters across benchmarks.

Recent research has shown that language models can learn to \textit{reason}, often via reinforcement learning. Some work even trains low-rank parameterizations for reasoning, but conventional LoRA cannot scale below the model dimension. We question whether even rank=1 LoRA is necessary for learning to reason and propose TinyLoRA, a method for scaling low-rank adapters to sizes as small as one parameter. Within our new parameterization, we are able to train the 8B parameter size of Qwen2.5 to 91\% accuracy on GSM8K with only 13 trained parameters in bf16 (26 total bytes). We find this trend holds in general: we are able to recover 90\% of performance improvements while training $1000x$ fewer parameters across a suite of more difficult learning-to-reason benchmarks such as AIME, AMC, and MATH500. Notably, we are only able to achieve such strong performance with RL: models trained using SFT require $100-1000x$ larger updates to reach the same performance.

View on arXiv PDF

Similar