CVApr 1

All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models

arXiv:2604.0047993.5h-index: 5
Predicted impact top 11% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This addresses a scalability problem in AI research by improving reasoning diversity in vision-language models, though it is incremental as it builds on existing methods like GRPO.

The paper tackles the issue of diversity collapse in reinforcement learning for vision-language models, where models converge prematurely to limited reasoning strategies, and proposes Multi-Group Policy Optimization (MUPO) to incentivize divergent thinking, demonstrating its effectiveness on established benchmarks.

Recent studies have demonstrated that Reinforcement Learning (RL), notably Group Relative Policy Optimization (GRPO), can intrinsically elicit and enhance the reasoning capabilities of Vision-Language Models (VLMs). However, despite the promise, the underlying mechanisms that drive the effectiveness of RL models as well as their limitations remain underexplored. In this paper, we highlight a fundamental behavioral distinction between RL and base models, where the former engages in deeper yet narrow reasoning, while base models, despite less refined along individual path, exhibit broader and more diverse thinking patterns. Through further analysis of training dynamics, we show that GRPO is prone to diversity collapse, causing models to prematurely converge to a limited subset of reasoning strategies while discarding the majority of potential alternatives, leading to local optima and poor scalability. To address this, we propose Multi-Group Policy Optimization (MUPO), a simple yet effective approach designed to incentivize divergent thinking across multiple solutions, and demonstrate its effectiveness on established benchmarks. Project page: https://xytian1008.github.io/MUPO/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes