AIMay 25, 2025

Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning

Shaohao Rui, Kaitao Chen, Weijie Ma, Xiaosong Wang

arXiv:2505.19213v120.815 citationsh-index: 4

Originality Incremental advance

AI Analysis

This addresses the need for open-ended, reasoning-intensive decision-making in medical imaging for clinicians, representing an incremental advance by adapting existing RL methods to a specific domain.

The paper tackled the problem of limited reasoning capabilities in medical reinforcement fine-tuning for VQA by introducing MedCCO, a curriculum-aware reinforcement learning framework, which achieved an 11.4% accuracy gain on in-domain tasks and a 5.7% improvement on out-of-domain benchmarks.

Recent advances in reinforcement learning with verifiable, rule-based rewards have greatly enhanced the reasoning capabilities and out-of-distribution generalization of VLMs/LLMs, obviating the need for manually crafted reasoning chains. Despite these promising developments in the general domain, their translation to medical imaging remains limited. Current medical reinforcement fine-tuning (RFT) methods predominantly focus on close-ended VQA, thereby restricting the model's ability to engage in world knowledge retrieval and flexible task adaptation. More critically, these methods fall short of addressing the critical clinical demand for open-ended, reasoning-intensive decision-making. To bridge this gap, we introduce \textbf{MedCCO}, the first multimodal reinforcement learning framework tailored for medical VQA that unifies close-ended and open-ended data within a curriculum-driven RFT paradigm. Specifically, MedCCO is initially fine-tuned on a diverse set of close-ended medical VQA tasks to establish domain-grounded reasoning capabilities, and is then progressively adapted to open-ended tasks to foster deeper knowledge enhancement and clinical interpretability. We validate MedCCO across eight challenging medical VQA benchmarks, spanning both close-ended and open-ended settings. Experimental results show that MedCCO consistently enhances performance and generalization, achieving a 11.4\% accuracy gain across three in-domain tasks, and a 5.7\% improvement on five out-of-domain benchmarks. These findings highlight the promise of curriculum-guided RL in advancing robust, clinically-relevant reasoning in medical multimodal language models.

View on arXiv PDF

Similar