CVJun 2, 2024

MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging

arXiv:2406.00631v16 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of integrating multimodal data for medical applications, offering a solution for tasks like tumor segmentation, though it appears incremental as it builds on existing contrastive learning and sequence modeling techniques.

The paper tackled the problem of unimodal deep learning in medical tasks by proposing a multimodal pre-training framework that jointly incorporates genomics and medical images, resulting in a model that outperformed a wide range of related methods in tumor segmentation tasks.

Medicine is inherently a multimodal discipline. Medical images can reflect the pathological changes of cancer and tumors, while the expression of specific genes can influence their morphological characteristics. However, most deep learning models employed for these medical tasks are unimodal, making predictions using either image data or genomic data exclusively. In this paper, we propose a multimodal pre-training framework that jointly incorporates genomics and medical images for downstream tasks. To address the issues of high computational complexity and difficulty in capturing long-range dependencies in genes sequence modeling with MLP or Transformer architectures, we utilize Mamba to model these long genomic sequences. We aligns medical images and genes using a self-supervised contrastive learning approach which combines the Mamba as a genetic encoder and the Vision Transformer (ViT) as a medical image encoder. We pre-trained on the TCGA dataset using paired gene expression data and imaging data, and fine-tuned it for downstream tumor segmentation tasks. The results show that our model outperformed a wide range of related methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes