GMoE: Empowering LLMs Fine-Tuning via MoE Graph Collaboration
This addresses a specific bottleneck in LLM fine-tuning for researchers and practitioners, offering an incremental improvement over existing MoE methods.
The paper tackles the load imbalance issue in sparse Mixture-of-Experts (MoE) architectures for large language models (LLMs) by introducing GMoE, a graph-based framework that enhances expert collaboration through a graph router and coordination strategies, resulting in improved stability and efficiency in fine-tuning as demonstrated on four benchmark datasets.
The sparse Mixture-of-Experts (MoE) architecture of large language models (LLMs) confronts an inherent issue of load imbalance arising from the simplistic linear router strategy, which ultimately causes the instability and inefficient learning of LLMs. To address this challenge, we introduce a novel MoE graph-based framework $\textbf{GMoE}$, aimed at enhancing the collaboration among multiple experts. In GMoE, a graph router function is designed to capture the collaboration signals among experts. This enables all experts to dynamically allocate information derived from input data by sharing information with their neighboring experts. Moreover, we put forward two coordination strategies in GMoE: the $\textit{Poisson distribution-based distinction strategy}$ and the $\textit{Normal distribution-based balance strategy}$, to further release the capacity of each expert and increase the model stability in the fine-tuning of LLMs. Specifically, we leverage a parameter-efficient fine-tuning technique, i.e., Low-Rank Adaptation (LoRA), to implement the graph MoE architecture. Extensive experiments on four real-world benchmark datasets demonstrate the effectiveness of GMoE, showing the benefits of facilitating collaborations of multiple experts in LLM fine-tuning. The code of experimental implementation is available at https://github.com/BAI-LAB/GMoE