CLAICVLGDec 23, 2024

Diving into Self-Evolving Training for Multimodal Reasoning

arXiv:2412.17451v333 citationsh-index: 13ICML
Originality Incremental advance
AI Analysis

This addresses a key limitation in training multimodal reasoning models, though it appears incremental as it builds on existing self-evolving and RL concepts.

The paper tackles performance saturation in self-evolving training for multimodal reasoning by reframing it through reinforcement learning, identifying three critical factors and proposing an automatic balancing mechanism; the resulting M-STAR framework achieves consistent performance gains across models and benchmarks.

Self-evolving trainin--where models iteratively learn from their own outputs--has emerged as a key approach for complex reasoning tasks, addressing the scarcity of high-quality chain-of-thought data. However, its effectiveness in multimodal reasoning, a domain more intricate than text-only reasoning, remains underexplored, and the understanding of critical factors in this training paradigm remains limited. Furthermore, a central challenge for this training method is performance saturation, which impedes further improvements and scalability. Inspired by reinforcement learning (RL), in this paper, we reframe self-evolving training for multimodal reasoning through the lens of RL, identifying three pivotal factors: Training Method, Reward Model, and Prompt Variation. Through systematic analysis, we establish relatively optimal design principles that significantly enhance multimodal reasoning capabilities. Moreover, delving deeper into training dynamics, we uncover the roots of saturation and propose a new automatic balancing mechanism to mitigate this limitation. Building on these insights, we propose M-STAR (Multimodal Self-evolving Training for Reasoning), a framework that achieves consistent performance gains across models of varying sizes and diverse benchmarks. All resources are made publicly available at https://mstar-lmm.github.io.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes