CV LGOct 20, 2023

CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training

Kihyun You, Jawook Gu, Jiyeon Ham, Beomhee Park, Jiho Kim, Eun Kyoung Hong, Woonhyunk Baek, Byungseok Roh

arXiv:2310.13292v126.8134 citationsh-index: 6Has Code

Originality Incremental advance

AI Analysis

This work tackles the challenge of limited data for vision-language pre-training in medical imaging, specifically for chest X-rays, offering incremental improvements over existing methods.

The paper addresses the scarcity of image-text data in chest X-ray analysis by expanding image-label pairs into image-text pairs using prompts and multiple report sections, achieving state-of-the-art performance in classification tasks with improved discriminative power, though with a slight trade-off in retrieval performance.

A large-scale image-text pair dataset has greatly contributed to the development of vision-language pre-training (VLP) models, which enable zero-shot or few-shot classification without costly annotation. However, in the medical domain, the scarcity of data remains a significant challenge for developing a powerful VLP model. In this paper, we tackle the lack of image-text data in chest X-ray by expanding image-label pair as image-text pair via general prompt and utilizing multiple images and multiple sections in a radiologic report. We also design two contrastive losses, named ICL and TCL, for learning study-level characteristics of medical images and reports, respectively. Our model outperforms the state-of-the-art models trained under the same conditions. Also, enlarged dataset improve the discriminative power of our pre-trained model for classification, while sacrificing marginal retrieval performance. Code is available at https://github.com/kakaobrain/cxr-clip.

View on arXiv PDF Code

Similar