AIOct 21, 2025

Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents

arXiv:2510.18424v17.81 citationsh-index: 1EMNLP

Originality Incremental advance

AI Analysis

This work addresses challenges in medical visual reasoning for healthcare applications, representing an incremental improvement by combining existing techniques like MCTS and PPO fine-tuning.

The authors tackled the problem of hallucinations, vague descriptions, inconsistent logic, and poor localization in Visual Language Models (VLMs) for medical reasoning by proposing Med-VRAgent, a framework based on Visual Guidance, Self-Reward paradigms, and Monte Carlo Tree Search (MCTS), which outperforms existing approaches on multiple medical VQA benchmarks.

Visual Language Models (VLMs) achieve promising results in medical reasoning but struggle with hallucinations, vague descriptions, inconsistent logic and poor localization. To address this, we propose a agent framework named Medical Visual Reasoning Agent (\textbf{Med-VRAgent}). The approach is based on Visual Guidance and Self-Reward paradigms and Monte Carlo Tree Search (MCTS). By combining the Visual Guidance with tree search, Med-VRAgent improves the medical visual reasoning capabilities of VLMs. We use the trajectories collected by Med-VRAgent as feedback to further improve the performance by fine-tuning the VLMs with the proximal policy optimization (PPO) objective. Experiments on multiple medical VQA benchmarks demonstrate that our method outperforms existing approaches.

View on arXiv PDF

Similar