CVApr 12

Enhancing Fine-Grained Spatial Grounding in 3D CT Report Generation via Discriminative Guidance

arXiv:2604.1043770.0h-index: 25
AI Analysis

For radiology report generation, this work addresses the bottleneck of fine-grained spatial grounding, improving both in-distribution and out-of-distribution performance significantly.

The paper proposes DCP-PD, a plug-and-play framework that uses discriminative cue-prompting with prompt dropout to enhance fine-grained spatial grounding in 3D CT report generation. It achieves state-of-the-art performance on CT-RATE (macro F1 from 0.501 to 0.603, 20% relative improvement) and substantially improves out-of-distribution performance on Rad-ChestCT (F1 from 0.266 to 0.503, 89% relative improvement).

Vision--language models (VLMs) for radiology report generation (RRG) can produce long-form chest CT reports from volumetric scans and show strong potential to improve radiology workflow efficiency and consistency. However, existing methods face two key limitations: (i) training supervision is often coarse, aligning a whole CT volume with a full free-text report without explicit alignment for fine-grained attributes or pathology locations; and (ii) evaluation is typically holistic (lexical overlap, entity matching, or LLM-as-a-judge scores) and not diagnostic for spatial grounding. We propose \emph{Discriminative Cue-Prompting with Prompt Dropout (DCP-PD)}, a plug-and-play framework that distills fine-grained cues from free-text reports and uses them to guide report generation while mitigating shortcut reliance via prompt dropout. DCP-PD achieves state-of-the-art performance on CT-RATE, improving macro F1 from $=0.501$ to $0.603$ (20% relative), and substantially boosts out-of-distribution performance on Rad-ChestCT from F1 $=0.266$ to $0.503$ (89% relative). Finally, we introduce a hierarchical, location-aware question-set protocol (presence $\rightarrow$ laterality $\rightarrow$ lobe) to directly assess pathology-location grounding, showing that fine-grained spatial localization remains challenging even for models that score highly on current benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes