CLJul 3, 2025

Multimodal Mathematical Reasoning with Diverse Solving Perspective

arXiv:2507.02804v16 citationsh-index: 9
Originality Incremental advance
AI Analysis

This work addresses the need for more robust and varied reasoning in multimodal AI for mathematical problem-solving, though it is incremental as it builds on existing models and datasets.

The paper tackles the problem of multimodal mathematical reasoning by addressing the lack of diverse reasoning perspectives in current models, introducing a dataset with multiple solution trajectories and a model that improves accuracy and generative diversity on benchmarks.

Recent progress in large-scale reinforcement learning (RL) has notably enhanced the reasoning capabilities of large language models (LLMs), especially in mathematical domains. However, current multimodal LLMs (MLLMs) for mathematical reasoning often rely on one-to-one image-text pairs and single-solution supervision, overlooking the diversity of valid reasoning perspectives and internal reflections. In this work, we introduce MathV-DP, a novel dataset that captures multiple diverse solution trajectories for each image-question pair, fostering richer reasoning supervision. We further propose Qwen-VL-DP, a model built upon Qwen-VL, fine-tuned with supervised learning and enhanced via group relative policy optimization (GRPO), a rule-based RL approach that integrates correctness discrimination and diversity-aware reward functions. Our method emphasizes learning from varied reasoning perspectives and distinguishing between correct yet distinct solutions. Extensive experiments on the MathVista's minitest and Math-V benchmarks demonstrate that Qwen-VL-DP significantly outperforms prior base MLLMs in both accuracy and generative diversity, highlighting the importance of incorporating diverse perspectives and reflective reasoning in multimodal mathematical reasoning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes