CVDec 20, 2023

ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training

arXiv:2312.13316v43 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the challenge of improving medical vision-language pre-training for better performance in multi-scale downstream applications like classification and segmentation, representing a strong domain-specific advancement.

The paper tackles the problem of overlooking linguistic complexity and imbalanced issues in medical reports, as well as complex cross-modality relationships in medical vision-language pre-training, by proposing the ECAMP framework, which achieves cutting-edge results on classification, segmentation, and detection tasks across 9 public datasets.

Despite significant advancements in medical vision-language pre-training, existing methods have largely overlooked the inherent linguistic complexity and imbalanced isssue within medical reports, as well as the complex cross-modality contextual relationships between texts and images. To close this gap, we propose a novel Entity-centered Context-aware Medical Vision-language Pre-training (ECAMP) framework, which establishes a more entity-centered, context-sensitive, and balanced understanding of medical reports to effectively pre-train the vision encoder. We first distill entity-centered context from medical reports utilizing large language models, enabling ECAMP to draw more precise supervision from the text modality. By further incorporating entity-aware re-balanced factor and descriptor masking strategies into masked languange modeling, ECAMP significantly enhances the knowledge of entities within the reports. A context-guided super-resolution task is proposed alongside a multi-scale context fusion design to improve the semantic integration of both coarse and fine-level image representations, which prompts better performance for multi-scale downstream applications. ECAMP integrates these innovations together, leading to significant performance leaps over current state-of-the-art methods and establish a new standard for cross-modality pre-training in medical imaging. The effectiveness of ECAMP is demonstrated by extensive experiments on various domains and organs, which achieves cutting-edge results on multiple tasks including classification, segmentation, and detection across 5 public chest X-ray and 4 fundoscopy datasets respectively.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes