CLAISep 30, 2025

LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts

arXiv:2509.25684v15 citationsh-index: 7
Originality Highly original
AI Analysis

This addresses the need for more efficient and adaptive fine-tuning methods in NLP, offering a novel routing approach that improves performance over existing baselines, though it is incremental in the context of MoE and PEFT techniques.

The paper tackled the problem of inefficient expert allocation in mixture-of-experts fine-tuning for large language models by proposing LD-MoLE, a learnable dynamic routing mechanism that replaces fixed TopK routing with adaptive, token-dependent allocation, achieving the highest average scores on benchmarks with Qwen3-1.7B and Llama-3.2-3B models.

Recent studies have shown that combining parameter-efficient fine-tuning (PEFT) with mixture-of-experts (MoE) is an effective strategy for adapting large language models (LLMs) to the downstream tasks. However, most existing approaches rely on conventional TopK routing, which requires careful hyperparameter tuning and assigns a fixed number of experts to each token. In this work, we propose LD-MoLE, a Learnable Dynamic routing mechanism for Mixture of LoRA Experts that enables adaptive, token-dependent, and layer-wise expert allocation. Our method replaces the non-differentiable TopK selection with a differentiable routing function and a closed-form solution. Moreover, our design allows the model to adaptively determine the number of experts to activate for each token at different layers. In addition, we introduce an analytical sparsity control objective to regularize the number of activated experts. Extensive experiments on the Qwen3-1.7B and Llama-3.2-3B models show that LD-MoLE achieves the highest average scores compared to state-of-the-art baselines, across a diverse set of benchmarks. Our method not only achieves superior performance, but also demonstrates the ability to learn token-dependent and layer-wise expert allocation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes