LGDec 29, 2025

Evaluating Parameter Efficient Methods for RLVR

Qingyu Yin, Yulun Wu, Zhennan Shen, Sunbowen Li, Zhilin Wang, Yanshu Li, Chak Tou Leong, Jiale Kang, Jinjin Gu

arXiv:2512.23165v216.98 citationsh-index: 10

Originality Incremental advance

AI Analysis

This work provides practical guidance for researchers and practitioners using RLVR to enhance language model reasoning capabilities, though it is incremental in evaluating existing PEFT methods rather than introducing new ones.

The authors systematically evaluated over 12 Parameter-Efficient Fine-Tuning (PEFT) methods for Reinforcement Learning with Verifiable Rewards (RLVR) on mathematical reasoning benchmarks, finding that structural variants like DoRA, AdaLoRA, and MiSS consistently outperform standard LoRA while identifying failures in SVD-informed methods and bottlenecks in extreme parameter reduction approaches.

We systematically evaluate Parameter-Efficient Fine-Tuning (PEFT) methods under the paradigm of Reinforcement Learning with Verifiable Rewards (RLVR). RLVR incentivizes language models to enhance their reasoning capabilities through verifiable feedback; however, while methods like LoRA are commonly used, the optimal PEFT architecture for RLVR remains unidentified. In this work, we conduct the first comprehensive evaluation of over 12 PEFT methodologies across the DeepSeek-R1-Distill families on mathematical reasoning benchmarks. Our empirical results challenge the default adoption of standard LoRA with three main findings. First, we demonstrate that structural variants, such as DoRA, AdaLoRA, and MiSS, consistently outperform LoRA. Second, we uncover a spectral collapse phenomenon in SVD-informed initialization strategies (\textit{e.g.,} PiSSA, MiLoRA), attributing their failure to a fundamental misalignment between principal-component updates and RL optimization. Furthermore, our ablations reveal that extreme parameter reduction (\textit{e.g.,} VeRA, Rank-1) severely bottlenecks reasoning capacity. We further conduct ablation studies and scaling experiments to validate our findings. This work provides a definitive guide for advocating for more exploration for parameter-efficient RL methods.

View on arXiv PDF

Similar