AI CL CV LGJul 11, 2025

M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning

Inclusion AI, Fudong Wang, Jiajia Liu, Jingdong Chen, Jun Zhou, Kaixiang Ji, Lixiang Ru, Qingpei Guo, Ruobing Zheng, Tianqi Li, Yi Yuan, Yifan Mao

arXiv:2507.08306v118 citationsh-index: 8

Originality Highly original

AI Analysis

This addresses a critical gap in MLLMs for real-world applications requiring spatial reasoning, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the problem of multimodal large language models struggling with dynamic spatial interactions by introducing M2-Reasoning-7B, which achieves state-of-the-art performance across 8 benchmarks through a novel data pipeline generating 294.2K samples and a dynamic multi-task training strategy.

Recent advancements in Multimodal Large Language Models (MLLMs), particularly through Reinforcement Learning with Verifiable Rewards (RLVR), have significantly enhanced their reasoning abilities. However, a critical gap persists: these models struggle with dynamic spatial interactions, a capability essential for real-world applications. To bridge this gap, we introduce M2-Reasoning-7B, a model designed to excel in both general and spatial reasoning. Our approach integrates two key innovations: (1) a novel data pipeline that generates 294.2K high-quality data samples (168K for cold-start fine-tuning and 126.2K for RLVR), which feature logically coherent reasoning trajectories and have undergone comprehensive assessment; and (2) a dynamic multi-task training strategy with step-wise optimization to mitigate conflicts between data, and task-specific rewards for delivering tailored incentive signals. This combination of curated data and advanced training allows M2-Reasoning-7B to set a new state-of-the-art (SOTA) across 8 benchmarks, showcasing superior performance in both general and spatial reasoning domains.

View on arXiv PDF

Similar