LGMar 10, 2025

ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration

Mengting Ai, Tianxin Wei, Yifan Chen, Zhichen Zeng, Ritchie Zhao, Girish Varatkar, Bita Darvish Rouhani, Xianfeng Tang, Hanghang Tong, Jingrui He

arXiv:2503.06881v122.020 citationsh-index: 21Has CodeKDD

Originality Incremental advance

AI Analysis

This addresses the problem of high memory requirements for deploying large MoE-based language models, making them more accessible, though it appears incremental as it builds on existing MoE compression approaches.

The paper tackles the space inefficiency of Mixture-of-Experts (MoE) Transformers during inference by introducing ResMoE, a framework that compresses experts using Wasserstein barycenter and residual approximation, reducing expert parameters by up to 75% with minimal accuracy loss.

Mixture-of-Experts (MoE) Transformer, the backbone architecture of multiple phenomenal language models, leverages sparsity by activating only a fraction of model parameters for each input token. The sparse structure, while allowing constant time costs, results in space inefficiency: we still need to load all the model parameters during inference. We introduce ResMoE, an innovative MoE approximation framework that utilizes Wasserstein barycenter to extract a common expert (barycenter expert) and approximate the residuals between this barycenter expert and the original ones. ResMoE enhances the space efficiency for inference of large-scale MoE Transformers in a one-shot and data-agnostic manner without retraining while maintaining minimal accuracy loss, thereby paving the way for broader accessibility to large language models. We demonstrate the effectiveness of ResMoE through extensive experiments on Switch Transformer, Mixtral, and DeepSeekMoE models. The results show that ResMoE can reduce the number of parameters in an expert by up to 75% while maintaining comparable performance. The code is available at https://github.com/iDEA-iSAIL-Lab-UIUC/ResMoE.

View on arXiv PDF Code

Similar