CVNov 4, 2025

Medical Report Generation: A Hierarchical Task Structure-Based Cross-Modal Causal Intervention Framework

arXiv:2511.02271v1h-index: 3
Originality Incremental advance
AI Analysis

This work addresses the burden on radiologists by improving automated medical report generation, though it appears incremental as it builds on existing methods to handle multiple challenges.

The paper tackled the problem of generating medical reports from radiological images by addressing three key challenges: insufficient domain knowledge, poor text-visual alignment, and spurious correlations, resulting in a framework that significantly outperforms state-of-the-art methods.

Medical Report Generation (MRG) is a key part of modern medical diagnostics, as it automatically generates reports from radiological images to reduce radiologists' burden. However, reliable MRG models for lesion description face three main challenges: insufficient domain knowledge understanding, poor text-visual entity embedding alignment, and spurious correlations from cross-modal biases. Previous work only addresses single challenges, while this paper tackles all three via a novel hierarchical task decomposition approach, proposing the HTSC-CIF framework. HTSC-CIF classifies the three challenges into low-, mid-, and high-level tasks: 1) Low-level: align medical entity features with spatial locations to enhance domain knowledge for visual encoders; 2) Mid-level: use Prefix Language Modeling (text) and Masked Image Modeling (images) to boost cross-modal alignment via mutual guidance; 3) High-level: a cross-modal causal intervention module (via front-door intervention) to reduce confounders and improve interpretability. Extensive experiments confirm HTSC-CIF's effectiveness, significantly outperforming state-of-the-art (SOTA) MRG methods. Code will be made public upon paper acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes