CVMay 18, 2023

MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts

Qiuhui Chen, Xinyue Hu, Zirui Wang, Yi Hong

arXiv:2305.10799v123.285 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the need for efficient medical diagnosis tools by clinicians, though it is incremental as it adapts existing methods to the medical domain.

The paper tackles the problem of developing a vision-language pre-training model for computer-aided diagnosis using medical images and text, achieving state-of-the-art performance on zero-shot classification of Alzheimer's disease stages and medical visual question answering on a dataset of over 30,000 image volumes.

Vision-language pre-training (VLP) models have been demonstrated to be effective in many computer vision applications. In this paper, we consider developing a VLP model in the medical domain for making computer-aided diagnoses (CAD) based on image scans and text descriptions in electronic health records, as done in practice. To achieve our goal, we present a lightweight CAD system MedBLIP, a new paradigm for bootstrapping VLP from off-the-shelf frozen pre-trained image encoders and frozen large language models. We design a MedQFormer module to bridge the gap between 3D medical images and 2D pre-trained image encoders and language models as well. To evaluate the effectiveness of our MedBLIP, we collect more than 30,000 image volumes from five public Alzheimer's disease (AD) datasets, i.e., ADNI, NACC, OASIS, AIBL, and MIRIAD. On this largest AD dataset we know, our model achieves the SOTA performance on the zero-shot classification of healthy, mild cognitive impairment (MCI), and AD subjects, and shows its capability of making medical visual question answering (VQA). The code and pre-trained models is available online: https://github.com/Qybc/MedBLIP.

View on arXiv PDF Code

Similar