CV CLJun 20, 2023

KiUT: Knowledge-injected U-Transformer for Radiology Report Generation

Zhongzhen Huang, Xiaofan Zhang, Shaoting Zhang

arXiv:2306.11345v124.3115 citationsh-index: 38

Originality Incremental advance

AI Analysis

This work addresses the heavy burden of report writing for radiologists by improving accuracy in medical image captioning, though it appears incremental as it builds on existing transformer-based methods with added knowledge components.

The paper tackled the problem of automatically generating clinically accurate radiology reports from X-ray images by proposing KiUT, a model that integrates multi-level visual representation with contextual and clinical knowledge, achieving state-of-the-art performance on IU-Xray and MIMIC-CXR datasets.

Radiology report generation aims to automatically generate a clinically accurate and coherent paragraph from the X-ray image, which could relieve radiologists from the heavy burden of report writing. Although various image caption methods have shown remarkable performance in the natural image field, generating accurate reports for medical images requires knowledge of multiple modalities, including vision, language, and medical terminology. We propose a Knowledge-injected U-Transformer (KiUT) to learn multi-level visual representation and adaptively distill the information with contextual and clinical knowledge for word prediction. In detail, a U-connection schema between the encoder and decoder is designed to model interactions between different modalities. And a symptom graph and an injected knowledge distiller are developed to assist the report generation. Experimentally, we outperform state-of-the-art methods on two widely used benchmark datasets: IU-Xray and MIMIC-CXR. Further experimental results prove the advantages of our architecture and the complementary benefits of the injected knowledge.

View on arXiv PDF

Similar