CVJun 28, 2024

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

Longrong Yang, Dong Shen, Chaoxiang Cai, Fan Yang, Tingting Gao, Di Zhang, Xi Li

arXiv:2406.19905v311 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses a specific bottleneck in improving efficiency and performance for large vision-language models, but appears incremental as it builds on existing MoE methods.

The paper tackles the problem of token gradient conflict in Mixture-of-Experts for Large Vision-Language Models, which causes interference between tokens within an expert, and proposes a method using token-level gradient analysis and regularization to reduce this conflict, demonstrating effectiveness in experiments.

The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLM encourage different experts to specialize in different tokens, and they usually employ a router to predict the routing of each token. However, the router is not optimized concerning distinct parameter optimization directions generated from tokens within an expert. This may lead to severe interference between tokens within an expert. To address this problem, we propose to use the token-level gradient analysis to Solving Token Gradient Conflict (STGC) in this paper. Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we add a regularization loss tailored to encourage conflicting tokens routing from their current experts to other experts, for reducing interference between tokens within an expert. Our method can serve as a plug-in for diverse LVLM methods, and extensive experimental results demonstrate its effectiveness. The code will be publicly available at https://github.com/longrongyang/STGC.

View on arXiv PDF Code

Similar