LGOct 11, 2024

Retraining-Free Merging of Sparse MoE via Hierarchical Clustering

U of Toronto
arXiv:2410.08589v418 citationsh-index: 7ICML
Originality Highly original
AI Analysis

This addresses deployment constraints for SMoE models in real-world, resource-limited settings, offering a practical solution for parameter reduction.

The paper tackles the high memory requirements of Sparse Mixture-of-Experts (SMoE) models in resource-limited environments by introducing HC-SMoE, a retraining-free expert merging framework that reduces parameters without performance loss, achieving superior results in zero-shot tasks on models like Qwen and Mixtral.

Sparse Mixture-of-Experts (SMoE) models represent a significant advancement in large language model (LLM) development through their efficient parameter utilization. These models achieve substantial performance improvements at reduced inference costs. However, the deployment of SMoE models faces constraints from extensive memory requirements of expert components in resource-limited environments. To address these limitations, this paper introduces Hierarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE), a task-agnostic expert merging framework for parameter reduction without retraining. HC-SMoE introduces a novel hierarchical clustering approach based on expert outputs to ensure merging robustness independent of routing decisions. The proposed output-based clustering method enables effective capture of functional relationships between experts for large-scale architectures. We provide theoretical analysis and comprehensive evaluations across multiple zero-shot language tasks to demonstrate HC-SMoE's effectiveness in state-of-the-art models including Qwen and Mixtral. The experimental results validate HC-SMoE's superior performance and practical applicability for real-world deployments.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes