IVCVMay 20, 2024

Mammo-CLIP: A Vision Language Foundation Model to Enhance Data Efficiency and Robustness in Mammography

AmazonCMU
arXiv:2405.12255v238 citationsh-index: 6Has CodeMICCAI
Originality Incremental advance
AI Analysis

This work addresses data efficiency and robustness challenges in computer-aided diagnosis for breast cancer detection, though it is incremental as it adapts existing CLIP methods to a specific medical domain.

The paper tackles the lack of large and diverse training data for breast cancer detection in mammography by proposing Mammo-CLIP, a vision-language model pre-trained on mammogram-report pairs, which demonstrates strong performance in classification and localization tasks, achieving data efficiency and robustness comparable to CLIP in computer vision.

The lack of large and diverse training data on Computer-Aided Diagnosis (CAD) in breast cancer detection has been one of the concerns that impedes the adoption of the system. Recently, pre-training with large-scale image text datasets via Vision-Language models (VLM) (\eg CLIP) partially addresses the issue of robustness and data efficiency in computer vision (CV). This paper proposes Mammo-CLIP, the first VLM pre-trained on a substantial amount of screening mammogram-report pairs, addressing the challenges of dataset diversity and size. Our experiments on two public datasets demonstrate strong performance in classifying and localizing various mammographic attributes crucial for breast cancer detection, showcasing data efficiency and robustness similar to CLIP in CV. We also propose Mammo-FActOR, a novel feature attribution method, to provide spatial interpretation of representation with sentence-level granularity within mammography reports. Code is available publicly: \url{https://github.com/batmanlab/Mammo-CLIP}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes