CLAIFeb 3, 2025

MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs

arXiv:2502.00997v316 citationsh-index: 35NAACL
Originality Highly original
AI Analysis

This addresses the problem of efficiently combining specialized AI models for researchers and practitioners working with MoE architectures.

The paper tackles the challenge of merging specialized Large Language Models into unified Mixture-of-Experts models, introducing techniques that reduce fine-tuning costs, improve performance over state-of-the-art methods, and enable merging of experts with different architectures.

The recent success of specialized Large Language Models (LLMs) in domains such as mathematical reasoning and coding has led to growing interest in methods for merging these expert LLMs into a unified Mixture-of-Experts (MoE) model, with the goal of enhancing performance in each domain while retaining effectiveness on general tasks. However, the effective merging of expert models remains an open challenge, especially for models with highly divergent weight parameters or different architectures. State-of-the-art MoE merging methods only work with homogeneous model architectures and rely on simple unweighted averaging to merge expert layers, which does not address parameter interference and requires extensive fine-tuning of the merged MoE to restore performance. To address these limitations, this paper introduces new MoE merging techniques, including strategies to mitigate parameter interference, routing heuristics to reduce the need for MoE fine-tuning, and a novel method for merging experts with different architectures. Extensive experiments across multiple domains demonstrate the effectiveness of our proposed methods, reducing fine-tuning costs, improving performance over state-of-the-art methods, and expanding the applicability of MoE merging.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes