LGCLMay 23, 2025

CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning

arXiv:2505.17553v22 citationsh-index: 7EMNLP
Originality Incremental advance
AI Analysis

This work addresses a bottleneck in MoE-based fine-tuning for AI practitioners, offering an incremental improvement to boost model capacity and efficiency.

The paper tackles the problem of underutilized capacity in mixture-of-experts (MoE) methods for parameter-efficient fine-tuning on heterogeneous datasets, proposing CoMoE with a contrastive objective to enhance modularization and specialization, resulting in consistent improvements across benchmarks and multi-task settings.

In parameter-efficient fine-tuning, mixture-of-experts (MoE), which involves specializing functionalities into different experts and sparsely activating them appropriately, has been widely adopted as a promising approach to trade-off between model capacity and computation overhead. However, current MoE variants fall short on heterogeneous datasets, ignoring the fact that experts may learn similar knowledge, resulting in the underutilization of MoE's capacity. In this paper, we propose Contrastive Representation for MoE (CoMoE), a novel method to promote modularization and specialization in MoE, where the experts are trained along with a contrastive objective by sampling from activated and inactivated experts in top-k routing. We demonstrate that such a contrastive objective recovers the mutual-information gap between inputs and the two types of experts. Experiments on several benchmarks and in multi-task settings demonstrate that CoMoE can consistently enhance MoE's capacity and promote modularization among the experts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes