Improving Multimodal Reasoning via Worst Dimension Optimization
For researchers working on multimodal reasoning, this work addresses the problem of hidden dimension failures in process reward models, offering a targeted optimization approach.
The paper identifies that current Process Reward Models for multimodal reasoning equally weigh constraints, hiding failures in individual dimensions. They propose a worst-dimension optimization method that improves reasoning validity, achieving a 5.2% accuracy gain on MathVista and 4.1% on ScienceQA.
Multimodal reasoning requires a path that retains integrity over a wide range of constraints, from visual grounding to logic consistency. However, the current Process Reward Models focus on heuristically defined rewards that equally weigh these factors, which may lead to the concealment of individual dimension failures by the dominating factors, without guaranteeing the validity of the reasoning process in general.