CLOct 16, 2025

AutoRubric-R1V: Rubric-Based Generative Rewards for Faithful Multimodal Reasoning

arXiv:2510.14738v17 citationsh-index: 7
Originality Highly original
AI Analysis

This addresses the issue of unreliable reasoning in multimodal AI systems, offering a scalable solution for enhancing faithfulness in complex tasks.

The paper tackles the problem of spurious reasoning in multimodal large language models by proposing AutoRubric-R1V, a framework that integrates reinforcement learning with verifiable rewards and process-level supervision through automatically collected rubric-based generative rewards, achieving state-of-the-art performance on six multimodal reasoning benchmarks and substantially improving reasoning faithfulness.

Multimodal large language models (MLLMs) have rapidly advanced from perception tasks to complex multi-step reasoning, yet reinforcement learning with verifiable rewards (RLVR) often leads to spurious reasoning since only the final-answer correctness is rewarded. To address this limitation, we propose AutoRubric-R1V, a framework that integrates RLVR with process-level supervision through automatically collected rubric-based generative rewards. Our key innovation lies in a scalable self-aggregation method that distills consistent reasoning checkpoints from successful trajectories, enabling problem-specific rubric construction without human annotation or stronger teacher models. By jointly leveraging rubric-based and outcome rewards, AutoRubric-R1V achieves state-of-the-art performance on six multimodal reasoning benchmarks and substantially improves reasoning faithfulness in dedicated evaluations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes