AnchorDiff: Topology-Aware Masked Diffusion with Confidence-based Rewriting for Radiology Report Generation
This work addresses the problem of sequence bias in radiology report generation for clinical applications, offering a new paradigm that improves grounding in image-specific evidence.
AnchorDiff introduces the first masked-diffusion framework for radiology report generation, using knowledge-graph-derived clinical anchors to overcome sequence bias in autoregressive models. It achieves state-of-the-art performance on MIMIC-CXR and MIMIC-RG4 benchmarks.
Radiology report generation (RRG) aims to automatically produce clinically accurate textual reports from medical images. Existing methods predominantly rely on autoregressive (AR) language models, whose causal dependency structure restricts generation to a unidirectional left-to-right process. This paradigm can induce sequence bias, where models tend to follow stereotypical token orders and high-frequency report templates rather than fully grounding generation in image-specific evidence. In this paper, we propose AnchorDiff, the first masked-diffusion framework for RRG that integrates knowledge-graph-derived clinical anchors into diffusion language modeling. By leveraging bidirectional context and iterative refinement, AnchorDiff mitigates the limitations of fixed-order autoregressive decoding. Specifically, we introduce a topology-aware training strategy that uses RadGraph-derived entity hierarchies to assign clinically important tokens differentiated masking protection and loss weights. We further design an inference-time rewriting strategy that detects unstable committed tokens through perturbation-based testing and selectively revises them during denoising. Extensive experiments on the MIMIC-CXR and MIMIC-RG4 benchmarks demonstrate that AnchorDiff achieves state-of-the-art (SOTA) performance, showing the effectiveness of clinically anchored masked diffusion for radiology report generation.