LGMar 10, 2025

ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration

arXiv:2503.06881v120 citationsh-index: 21Has CodeKDD
Originality Incremental advance
AI Analysis

This addresses the problem of high memory requirements for deploying large MoE-based language models, making them more accessible, though it appears incremental as it builds on existing MoE compression approaches.

The paper tackles the space inefficiency of Mixture-of-Experts (MoE) Transformers during inference by introducing ResMoE, a framework that compresses experts using Wasserstein barycenter and residual approximation, reducing expert parameters by up to 75% with minimal accuracy loss.

Mixture-of-Experts (MoE) Transformer, the backbone architecture of multiple phenomenal language models, leverages sparsity by activating only a fraction of model parameters for each input token. The sparse structure, while allowing constant time costs, results in space inefficiency: we still need to load all the model parameters during inference. We introduce ResMoE, an innovative MoE approximation framework that utilizes Wasserstein barycenter to extract a common expert (barycenter expert) and approximate the residuals between this barycenter expert and the original ones. ResMoE enhances the space efficiency for inference of large-scale MoE Transformers in a one-shot and data-agnostic manner without retraining while maintaining minimal accuracy loss, thereby paving the way for broader accessibility to large language models. We demonstrate the effectiveness of ResMoE through extensive experiments on Switch Transformer, Mixtral, and DeepSeekMoE models. The results show that ResMoE can reduce the number of parameters in an expert by up to 75% while maintaining comparable performance. The code is available at https://github.com/iDEA-iSAIL-Lab-UIUC/ResMoE.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes