CLAILGSep 26, 2025

Elastic MoE: Unlocking the Inference-Time Scalability of Mixture-of-Experts

arXiv:2509.21892v18 citationsh-index: 15
Originality Highly original
AI Analysis

This addresses a scalability bottleneck for MoE models in inference, enabling more flexible and efficient deployment in AI applications.

The paper tackles the problem of Mixture-of-Experts (MoE) models degrading in performance when activating more experts at inference than during training, identifying a lack of learned collaboration as the cause. It introduces Elastic MoE (EMoE), a training framework that enables scaling activated experts at inference by 2-3× the training-time number without additional training overhead, significantly expanding the performance-scaling range and improving peak performance.

Mixture-of-Experts (MoE) models typically fix the number of activated experts $k$ at both training and inference. Intuitively, activating more experts at inference $k'$ (where $k'> k$) means engaging a larger set of model parameters for the computation and thus is expected to improve performance. However, contrary to this intuition, we find the scaling range to be so narrow that performance begins to degrade rapidly after only a slight increase in the number of experts. Further investigation reveals that this degradation stems from a lack of learned collaboration among experts. To address this, we introduce Elastic Mixture-of-Experts (EMoE), a novel training framework that enables MoE models to scale the number of activated experts at inference without incurring additional training overhead. By simultaneously training experts to collaborate in diverse combinations and encouraging the router for high-quality selections, EMoE ensures robust performance across computational budgets at inference. We conduct extensive experiments on various MoE settings. Our results show that EMoE significantly expands the effective performance-scaling range, extending it to as much as 2-3$\times$ the training-time $k$, while also pushing the model's peak performance to a higher level.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes