CLLGApr 7, 2024

SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts

arXiv:2404.05089v133 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses efficiency challenges in large-scale MoE models for deep learning practitioners, though it is incremental as it builds on existing MoE methods.

The paper tackles the high memory and compute requirements of Mixture-of-Experts (MoE) models by introducing SEER-MoE, a two-stage framework that prunes experts and fine-tunes with regularization, achieving optimized inference efficiency with minimal accuracy loss.

The advancement of deep learning has led to the emergence of Mixture-of-Experts (MoEs) models, known for their dynamic allocation of computational resources based on input. Despite their promise, MoEs face challenges, particularly in terms of memory requirements. To address this, our work introduces SEER-MoE, a novel two-stage framework for reducing both the memory footprint and compute requirements of pre-trained MoE models. The first stage involves pruning the total number of experts using a heavy-hitters counting guidance, while the second stage employs a regularization-based fine-tuning strategy to recover accuracy loss and reduce the number of activated experts during inference. Our empirical studies demonstrate the effectiveness of our method, resulting in a sparse MoEs model optimized for inference efficiency with minimal accuracy trade-offs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes