CL AIApr 1, 2025

DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing Mechanism

Dengchun Li, Naizheng Wang, Zihao Zhang, Haoyang Yin, Lei Duan, Meng Xiao, Mingjie Tang

arXiv:2504.00661v110.97 citationsh-index: 4Has Code

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in parameter-efficient fine-tuning for NLP tasks, offering incremental improvements to routing mechanisms in MoLE models.

The paper tackles the trade-off between computational efficiency and predictive accuracy in Mixture of LoRA Experts (MoLE) fine-tuning for large language models by proposing DynMoLE, a hybrid routing mechanism that dynamically adjusts expert selection based on Tsallis entropy, resulting in a 9.6% improvement over LoRA and a 2.3% gain over the state-of-the-art MoLE method on commonsense reasoning benchmarks.

Instruction-based fine-tuning of large language models (LLMs) has achieved remarkable success in various natural language processing (NLP) tasks. Parameter-efficient fine-tuning (PEFT) methods, such as Mixture of LoRA Experts (MoLE), combine the efficiency of Low-Rank Adaptation (LoRA) with the versatility of Mixture of Experts (MoE) models, demonstrating significant potential for handling multiple downstream tasks. However, the existing routing mechanisms for MoLE often involve a trade-off between computational efficiency and predictive accuracy, and they fail to fully address the diverse expert selection demands across different transformer layers. In this work, we propose DynMoLE, a hybrid routing strategy that dynamically adjusts expert selection based on the Tsallis entropy of the router's probability distribution. This approach mitigates router uncertainty, enhances stability, and promotes more equitable expert participation, leading to faster convergence and improved model performance. Additionally, we introduce an auxiliary loss based on Tsallis entropy to further guide the model toward convergence with reduced uncertainty, thereby improving training stability and performance. Our extensive experiments on commonsense reasoning benchmarks demonstrate that DynMoLE achieves substantial performance improvements, outperforming LoRA by 9.6% and surpassing the state-of-the-art MoLE method, MoLA, by 2.3%. We also conduct a comprehensive ablation study to evaluate the contributions of DynMoLE's key components.

View on arXiv PDF Code

Similar