LGAIJan 14

GeoRA: Geometry-Aware Low-Rank Adaptation for RLVR

arXiv:2601.09361v1h-index: 9
Originality Incremental advance
AI Analysis

This addresses optimization challenges in RLVR for large-scale reasoning models, offering a novel method to improve performance and generalization, though it appears incremental as it builds on low-rank adaptation techniques.

The paper tackled the problem of parameter-efficient fine-tuning for Reinforcement Learning with Verifiable Rewards (RLVR), where existing methods cause spectral collapse and instability, and proposed GeoRA, a geometry-aware low-rank adaptation method that achieved state-of-the-art results on mathematical benchmarks with Qwen and Llama models.

Reinforcement Learning with Verifiable Rewards (RLVR) is crucial for advancing large-scale reasoning models. However, existing parameter-efficient methods, such as PiSSA and MiLoRA, are designed for Supervised Fine-Tuning (SFT) and do not account for the distinct optimization dynamics and geometric structures of RLVR. Applying these methods directly leads to spectral collapse and optimization instability, which severely limit model performance. Meanwhile, alternative approaches that leverage update sparsity encounter significant efficiency bottlenecks on modern hardware due to unstructured computations. To address these challenges, we propose GeoRA (Geometry-Aware Low-Rank Adaptation), which exploits the anisotropic and compressible nature of RL update subspaces. GeoRA initializes adapters by extracting principal directions via Singular Value Decomposition (SVD) within a geometrically constrained subspace while freezing the residual components. This method preserves the pre-trained geometric structure and enables efficient GPU computation through dense operators. Experiments on Qwen and Llama demonstrate that GeoRA mitigates optimization bottlenecks caused by geometric misalignment. It consistently outperforms established low-rank baselines on key mathematical benchmarks, achieving state-of-the-art (SOTA) results. Moreover, GeoRA shows superior generalization and resilience to catastrophic forgetting in out-of-domain tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes