CV CL LG MMMar 13, 2023

PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents

Weixiong Lin, Ziheng Zhao, Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie

Harvard

arXiv:2303.07240v137.4314 citationsh-index: 50Has Code

Originality Incremental advance

AI Analysis

This addresses data scarcity in biomedical AI, enabling better multimodal models for medical applications, though it is incremental as it adapts an existing CLIP-style approach to a new domain.

The authors tackled the lack of large-scale biomedical datasets for foundation models by creating PMC-OA, a dataset with 1.6M image-caption pairs, and trained PMC-CLIP, which achieved state-of-the-art results, including +8.1% R@10 on image-text retrieval and +3.9% accuracy on image classification.

Foundation models trained on large-scale dataset gain a recent surge in CV and NLP. In contrast, development in biomedical domain lags far behind due to data scarcity. To address this issue, we build and release PMC-OA, a biomedical dataset with 1.6M image-caption pairs collected from PubMedCentral's OpenAccess subset, which is 8 times larger than before. PMC-OA covers diverse modalities or diseases, with majority of the image-caption samples aligned at finer-grained level, i.e., subfigure and subcaption. While pretraining a CLIP-style model on PMC-OA, our model named PMC-CLIP achieves state-of-the-art results on various downstream tasks, including image-text retrieval on ROCO, MedMNIST image classification, Medical VQA, i.e. +8.1% R@10 on image-text retrieval, +3.9% accuracy on image classification.

View on arXiv PDF Code

Similar