CVAIDec 31, 2024

CRRG-CLIP: Automatic Generation of Chest Radiology Reports and Classification of Chest Radiographs

arXiv:2501.01989v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses inefficiencies and accuracy issues in radiology for healthcare professionals, but it is incremental as it combines existing methods like Faster R-CNN, GPT-2, and CLIP.

The paper tackles the problem of automating chest radiology report generation and radiograph classification to improve efficiency and consistency in radiology, with results showing the generation module outperforms GPT-4o on some metrics and the classification module surpasses state-of-the-art models in AUC and Accuracy.

The complexity of stacked imaging and the massive number of radiographs make writing radiology reports complex and inefficient. Even highly experienced radiologists struggle to maintain accuracy and consistency in interpreting radiographs under prolonged high-intensity work. To address these issues, this work proposes the CRRG-CLIP Model (Chest Radiology Report Generation and Radiograph Classification Model), an end-to-end model for automated report generation and radiograph classification. The model consists of two modules: the radiology report generation module and the radiograph classification module. The generation module uses Faster R-CNN to identify anatomical regions in radiographs, a binary classifier to select key regions, and GPT-2 to generate semantically coherent reports. The classification module uses the unsupervised Contrastive Language Image Pretraining (CLIP) model, addressing the challenges of high-cost labelled datasets and insufficient features. The results show that the generation module performs comparably to high-performance baseline models on BLEU, METEOR, and ROUGE-L metrics, and outperformed the GPT-4o model on BLEU-2, BLEU-3, BLEU-4, and ROUGE-L metrics. The classification module significantly surpasses the state-of-the-art model in AUC and Accuracy. This demonstrates that the proposed model achieves high accuracy, readability, and fluency in report generation, while multimodal contrastive training with unlabelled radiograph-report pairs enhances classification performance.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes