CVJun 28, 2024

Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model

arXiv:2406.19905v311 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses a specific bottleneck in improving efficiency and performance for large vision-language models, but appears incremental as it builds on existing MoE methods.

The paper tackles the problem of token gradient conflict in Mixture-of-Experts for Large Vision-Language Models, which causes interference between tokens within an expert, and proposes a method using token-level gradient analysis and regularization to reduce this conflict, demonstrating effectiveness in experiments.

The Mixture-of-Experts (MoE) has gained increasing attention in studying Large Vision-Language Models (LVLMs). It uses a sparse model to replace the dense model, achieving comparable performance while activating fewer parameters during inference, thus significantly reducing the inference cost. Existing MoE methods in LVLM encourage different experts to specialize in different tokens, and they usually employ a router to predict the routing of each token. However, the router is not optimized concerning distinct parameter optimization directions generated from tokens within an expert. This may lead to severe interference between tokens within an expert. To address this problem, we propose to use the token-level gradient analysis to Solving Token Gradient Conflict (STGC) in this paper. Specifically, we first use token-level gradients to identify conflicting tokens in experts. After that, we add a regularization loss tailored to encourage conflicting tokens routing from their current experts to other experts, for reducing interference between tokens within an expert. Our method can serve as a plug-in for diverse LVLM methods, and extensive experimental results demonstrate its effectiveness. The code will be publicly available at https://github.com/longrongyang/STGC.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes