CVJan 3, 2024

MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning

arXiv:2401.01591v110 citationsh-index: 7ISBI
Originality Incremental advance
AI Analysis

This work addresses data efficiency for medical AI applications, though it is incremental as it builds on existing contrastive pre-training methods.

The paper tackled the problem of limited and complex medical image-text pairs by proposing a Medical Language-Image Pre-training (MLIP) framework with patch-sentence matching and masked contrastive learning, resulting in large-margin improvements in zero/few-shot classification and segmentation tasks.

Existing contrastive language-image pre-training aims to learn a joint representation by matching abundant image-text pairs. However, the number of image-text pairs in medical datasets is usually orders of magnitude smaller than that in natural datasets. Besides, medical image-text pairs often involve numerous complex fine-grained correspondences. This paper aims to enhance the data efficiency by introducing multiple-to-multiple local relationship modeling to capture denser supervisions. More specifically, we propose a Medical Language-Image Pre-training (MLIP) framework, which exploits the limited image-text medical data more efficiently through patch-sentence matching. Furthermore, we introduce a masked contrastive learning strategy with semantic integrity estimation to reduce redundancy in images while preserving the underlying semantics. Our evaluation results show that MLIP outperforms previous work in zero/few-shot classification and few-shot segmentation tasks by a large margin.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes