Parameter-Efficient Fine-Tuning of LLMs with Mixture of Space Experts

Buze Zhang, Jinkai Tao, Zilang Zeng, Neil He, Ali Maatouk, Menglin Yang, Rex Ying

arXiv:2602.14490v11.4h-index: 2

Originality Highly original

AI Analysis

This addresses the problem of limited expressiveness in fine-tuning LLMs for downstream tasks, offering a novel approach that is incremental but with strong specific gains.

The paper tackles the limitation of existing Parameter-Efficient Fine-Tuning (PEFT) methods by proposing a unified framework, Mixture of Space (MoS), that uses multiple geometric spaces to capture complex structures in language data, resulting in up to 5.6% improvement on MATH500 and 15.9% on MAWPS benchmarks.

Large Language Models (LLMs) have achieved remarkable progress, with Parameter-Efficient Fine-Tuning (PEFT) emerging as a key technique for downstream task adaptation. However, existing PEFT methods mainly operate in Euclidean space, fundamentally limiting their capacity to capture complex geometric structures inherent in language data. While alternative geometric spaces, like hyperbolic geometries for hierarchical data and spherical manifolds for circular patterns, offer theoretical advantages, forcing representations into a single manifold type ultimately limits expressiveness, even when curvature parameters are learnable. To address this, we propose Mixture of Space (MoS), a unified framework that leverages multiple geometric spaces simultaneously to learn richer, curvature-aware representations. Building on this scheme, we develop MoSLoRA, which extends Low-Rank Adaptation (LoRA) with heterogeneous geometric experts, enabling models to dynamically select or combine appropriate geometric spaces based on input context. Furthermore, to address the computational overhead of frequent manifold switching, we develop a lightweight routing mechanism. Moreover, we provide empirical insights into how curvature optimization impacts training stability and model performance. Our experiments across diverse benchmarks demonstrate that MoSLoRA consistently outperforms strong baselines, achieving up to 5.6% improvement on MATH500 and 15.9% on MAWPS.

View on arXiv PDF

Similar