LG CLMay 25

RotMoLE: Enhancing Mixture of Low-Rank Experts through Rotational Gating Mechanism

Mengyang Sun, Maochuan Dou, Tao Feng, Dan Zhang, Yihao Wang, Junpeng Liu, Yifan Zhu, Jie Tang

arXiv:2605.2556584.7

AI Analysis

For practitioners fine-tuning LLMs with limited expert capacity, RotMoLE offers a more effective gating strategy to handle diverse knowledge domains.

RotMoLE introduces a rotational gating mechanism for Mixture-of-Low-Rank Experts (MoE-LoRA) that rotates expert outputs instead of merely scaling them, improving expert specialization and representation. Experiments on multi-task and multilingual benchmarks show performance gains over standard MoE-LoRA.

While Large Language Models (LLMs) are commonly fine-tuned to handle domain-specific tasks before being applied to vertical applications, adapting them to complex scenarios with diverse specialized knowledge remains challenging. Meanwhile, Mixture-of-Experts (MoE) architecture has risen as a crucial paradigm for training LLMs, and some recent works have also incorporated MoE into Parameter-Efficient Fine-Tuning (PEFT) to propose the Mixture of Low-rank Experts (MoE-LoRA), to enhance the power of low-rank adapters for learning complicated knowledge. However, conventional gating mechanisms in MoE typically apply only a scalar reweighing to selected experts, thereby limiting their underlying capacity of representation and generalization. Motivated and enabled by the low-rank structures in MoE-LoRA, we propose RotMoLE, a specialized MoE framework for low-rank experts featuring an additional rotation gate. Beyond simple scaling, RotMoLE implements a rotation mechanism for each selected expert, enabling superior expert exploitation and specialization for learning diverse data, especially when expert candidates are limited. Empirical results on complex multi-task and multilingual training scenarios validate our effectiveness.

View on arXiv PDF

Similar