LG AIJan 14

GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

Jiaying Zhang, Lei Shi, Jiguo Li, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He

arXiv:2601.09361v11.4h-index: 9

Originality Incremental advance

AI Analysis

This addresses optimization challenges in RLVR for large-scale reasoning models, offering a novel method to improve performance and generalization, though it appears incremental as it builds on low-rank adaptation techniques.

The paper tackled the problem of parameter-efficient fine-tuning for Reinforcement Learning with Verifiable Rewards (RLVR), where existing methods cause spectral collapse and instability, and proposed GeoRA, a geometry-aware low-rank adaptation method that achieved state-of-the-art results on mathematical benchmarks with Qwen and Llama models.

Reinforcement Learning with Verifiable Rewards (RLVR) is crucial for advancing large-scale reasoning models. However, existing parameter-efficient methods, such as PiSSA and MiLoRA, are designed for Supervised Fine-Tuning (SFT) and do not account for the distinct optimization dynamics and geometric structures of RLVR. Applying these methods directly leads to spectral collapse and optimization instability, which severely limit model performance. Meanwhile, alternative approaches that leverage update sparsity encounter significant efficiency bottlenecks on modern hardware due to unstructured computations. To address these challenges, we propose GeoRA (Geometry-Aware Low-Rank Adaptation), which exploits the anisotropic and compressible nature of RL update subspaces. GeoRA initializes adapters by extracting principal directions via Singular Value Decomposition (SVD) within a geometrically constrained subspace while freezing the residual components. This method preserves the pre-trained geometric structure and enables efficient GPU computation through dense operators. Experiments on Qwen and Llama demonstrate that GeoRA mitigates optimization bottlenecks caused by geometric misalignment. It consistently outperforms established low-rank baselines on key mathematical benchmarks, achieving state-of-the-art (SOTA) results. Moreover, GeoRA shows superior generalization and resilience to catastrophic forgetting in out-of-domain tasks.

View on arXiv PDF

Similar