CV AI CLSep 23, 2025

OraPO: Oracle-educated Reinforcement Learning for Data-efficient and Factual Radiology Report Generation

Zhuoxiao Chen, Hongyang Yu, Ying Xu, Yadan Luo, Long Duong, Yuan-Fang Li

arXiv:2509.18600v18.42 citationsh-index: 7

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficient and factual radiology report generation for medical AI applications, representing a strong specific gain rather than a broad paradigm shift.

The paper tackles the problem of data- and compute-intensive radiology report generation by proposing OraPO with FactScore-based reward, achieving state-of-the-art performance on the CheXpert Plus dataset (0.341 F1) with 2-3 orders of magnitude less training data using a small base VLM on modest hardware.

Radiology report generation (RRG) aims to automatically produce clinically faithful reports from chest X-ray images. Prevailing work typically follows a scale-driven paradigm, by multi-stage training over large paired corpora and oversized backbones, making pipelines highly data- and compute-intensive. In this paper, we propose Oracle-educated GRPO {OraPO) with a FactScore-based reward (FactS) to tackle the RRG task under constrained budgets. OraPO enables single-stage, RL-only training by converting failed GRPO explorations on rare or difficult studies into direct preference supervision via a lightweight oracle step. FactS grounds learning in diagnostic evidence by extracting atomic clinical facts and checking entailment against ground-truth labels, yielding dense, interpretable sentence-level rewards. Together, OraPO and FactS create a compact and powerful framework that significantly improves learning efficiency on clinically challenging cases, setting the new SOTA performance on the CheXpert Plus dataset (0.341 in F1) with 2--3 orders of magnitude less training data using a small base VLM on modest hardware.

View on arXiv PDF

Similar