CVJan 3, 2024

MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning

Jiarun Liu, Hong-Yu Zhou, Cheng Li, Weijian Huang, Hao Yang, Yong Liang, Shanshan Wang

arXiv:2401.01591v19.610 citationsh-index: 7ISBI

Originality Incremental advance

AI Analysis

This work addresses data efficiency for medical AI applications, though it is incremental as it builds on existing contrastive pre-training methods.

The paper tackled the problem of limited and complex medical image-text pairs by proposing a Medical Language-Image Pre-training (MLIP) framework with patch-sentence matching and masked contrastive learning, resulting in large-margin improvements in zero/few-shot classification and segmentation tasks.

Existing contrastive language-image pre-training aims to learn a joint representation by matching abundant image-text pairs. However, the number of image-text pairs in medical datasets is usually orders of magnitude smaller than that in natural datasets. Besides, medical image-text pairs often involve numerous complex fine-grained correspondences. This paper aims to enhance the data efficiency by introducing multiple-to-multiple local relationship modeling to capture denser supervisions. More specifically, we propose a Medical Language-Image Pre-training (MLIP) framework, which exploits the limited image-text medical data more efficiently through patch-sentence matching. Furthermore, we introduce a masked contrastive learning strategy with semantic integrity estimation to reduce redundancy in images while preserving the underlying semantics. Our evaluation results show that MLIP outperforms previous work in zero/few-shot classification and few-shot segmentation tasks by a large margin.

View on arXiv PDF

Similar