AIJun 5

Improving Multimodal Reasoning via Worst Dimension Optimization

Haocheng Lv, Huaping Zhang, Qiuchi Li, Lei Li, Chunxiao Gao

arXiv:2606.07801h-index: 9

Originality Incremental advance

AI Analysis

For researchers working on multimodal reasoning, this work addresses the problem of hidden dimension failures in process reward models, offering a targeted optimization approach.

The paper identifies that current Process Reward Models for multimodal reasoning equally weigh constraints, hiding failures in individual dimensions. They propose a worst-dimension optimization method that improves reasoning validity, achieving a 5.2% accuracy gain on MathVista and 4.1% on ScienceQA.

Multimodal reasoning requires a path that retains integrity over a wide range of constraints, from visual grounding to logic consistency. However, the current Process Reward Models focus on heuristically defined rewards that equally weigh these factors, which may lead to the concealment of individual dimension failures by the dominating factors, without guaranteeing the validity of the reasoning process in general.

View on arXiv PDF

Similar