CVApr 12

Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance

Chenyu Wang, Weicheng Dai, Han Liu, Wenchao Li, Kayhan Batmanghelich

arXiv:2604.1043770.0h-index: 25

AI Analysis

For radiology report generation, this work addresses the bottleneck of fine-grained spatial grounding, improving both in-distribution and out-of-distribution performance significantly.

The paper proposes DCP-PD, a plug-and-play framework that uses discriminative cue-prompting with prompt dropout to enhance fine-grained spatial grounding in 3D CT report generation. It achieves state-of-the-art performance on CT-RATE (macro F1 from 0.501 to 0.603, 20% relative improvement) and substantially improves out-of-distribution performance on Rad-ChestCT (F1 from 0.266 to 0.503, 89% relative improvement).

Vision--language models (VLMs) for radiology report generation (RRG) can produce long-form chest CT reports from volumetric scans and show strong potential to improve radiology workflow efficiency and consistency. However, existing methods face two key limitations: (i) training supervision is often coarse, aligning a whole CT volume with a full free-text report without explicit alignment for fine-grained attributes or pathology locations; and (ii) evaluation is typically holistic (lexical overlap, entity matching, or LLM-as-a-judge scores) and not diagnostic for spatial grounding. We propose \emph{Discriminative Cue-Prompting with Prompt Dropout (DCP-PD)}, a plug-and-play framework that distills fine-grained cues from free-text reports and uses them to guide report generation while mitigating shortcut reliance via prompt dropout. DCP-PD achieves state-of-the-art performance on CT-RATE, improving macro F1 from $=0.501$ to $0.603$ (20% relative), and substantially boosts out-of-distribution performance on Rad-ChestCT from F1 $=0.266$ to $0.503$ (89% relative). Finally, we introduce a hierarchical, location-aware question-set protocol (presence $\rightarrow$ laterality $\rightarrow$ lobe) to directly assess pathology-location grounding, showing that fine-grained spatial localization remains challenging even for models that score highly on current benchmarks.

View on arXiv PDF

Similar