CVAILGJul 5, 2024

Multi-modal Masked Siamese Network Improves Chest X-Ray Representation Learning

arXiv:2407.04449v14 citationsh-index: 12Has Code
AI Analysis

This work addresses the challenge of leveraging multimodal data for medical imaging representation learning, offering incremental improvements by integrating EHR information into existing self-supervised methods.

The paper tackled the problem of self-supervised learning for chest X-rays by incorporating Electronic Health Record (EHR) data into a Masked Siamese Network, resulting in significant improvements in representation quality compared to vanilla MSN and state-of-the-art baselines on datasets like MIMIC-CXR, CheXpert, and NIH-14.

Self-supervised learning methods for medical images primarily rely on the imaging modality during pretraining. While such approaches deliver promising results, they do not leverage associated patient or scan information collected within Electronic Health Records (EHR). Here, we propose to incorporate EHR data during self-supervised pretraining with a Masked Siamese Network (MSN) to enhance the quality of chest X-ray representations. We investigate three types of EHR data, including demographic, scan metadata, and inpatient stay information. We evaluate our approach on three publicly available chest X-ray datasets, MIMIC-CXR, CheXpert, and NIH-14, using two vision transformer (ViT) backbones, specifically ViT-Tiny and ViT-Small. In assessing the quality of the representations via linear evaluation, our proposed method demonstrates significant improvement compared to vanilla MSN and state-of-the-art self-supervised learning baselines. Our work highlights the potential of EHR-enhanced self-supervised pre-training for medical imaging. The code is publicly available at: https://github.com/nyuad-cai/CXR-EHR-MSN

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes