CLSep 26, 2025

EditGRPO: Reinforcement Learning with Post-Rollout Edits for Clinically Accurate Chest X-Ray Report Generation

Kai Zhang, Christopher Malon, Lichao Sun, Martin Renqiang Min

arXiv:2509.22812v210.95 citationsh-index: 4IJCNLP-AACL

Originality Incremental advance

AI Analysis

This work addresses the need for clinically aligned radiology report generation, which is incremental as it builds on existing multimodal large language models by adding a reinforcement learning component.

The paper tackled the problem of generating clinically accurate chest X-ray reports by introducing EditGRPO, a reinforcement learning algorithm that optimizes generation using clinically motivated rewards, resulting in an average improvement of 3.4% in clinical metrics across four datasets and 5.9% gain on unseen datasets.

Radiology report generation requires advanced medical image analysis, effective temporal reasoning, and accurate text generation. Although recent innovations, particularly multimodal large language models, have shown improved performance, their supervised fine-tuning (SFT) objective is not explicitly aligned with clinical efficacy. In this work, we introduce EditGRPO, a mixed-policy reinforcement learning algorithm designed specifically to optimize the generation through clinically motivated rewards. EditGRPO integrates on-policy exploration with off-policy guidance by injecting sentence-level detailed corrections during training rollouts. This mixed-policy approach addresses the exploration dilemma and sampling efficiency issues typically encountered in RL. Applied to a Qwen2.5-VL-3B, EditGRPO outperforms both SFT and vanilla GRPO baselines, achieving an average improvement of 3.4\% in clinical metrics across four major datasets. Notably, EditGRPO also demonstrates superior out-of-domain generalization, with an average performance gain of 5.9\% on unseen datasets.

View on arXiv PDF

Similar