CVJan 2, 2024

AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis

arXiv:2401.01074v312 citationsh-index: 4BIBM
Originality Incremental advance
AI Analysis

This addresses the problem of effectively integrating diverse medical data for improved diagnostic accuracy in healthcare, representing an incremental advancement in multimodal fusion methods.

The paper tackled the challenge of fusing multimodal medical data for computer-aided diagnosis by proposing AliFuse, a transformer-based framework that aligns and fuses images and clinical records, achieving state-of-the-art performance in Alzheimer's disease classification on five public datasets and outperforming eight baselines.

Medical data collected for diagnostic decisions are typically multimodal, providing comprehensive information on a subject. While computer-aided diagnosis systems can benefit from multimodal inputs, effectively fusing such data remains a challenging task and a key focus in medical research. In this paper, we propose a transformer-based framework, called Alifuse, for aligning and fusing multimodal medical data. Specifically, we convert medical images and both unstructured and structured clinical records into vision and language tokens, employing intramodal and intermodal attention mechanisms to learn unified representations of all imaging and non-imaging data for classification. Additionally, we integrate restoration modeling with contrastive learning frameworks, jointly learning the high-level semantic alignment between images and texts and the low-level understanding of one modality with the help of another. We apply Alifuse to classify Alzheimer's disease, achieving state-of-the-art performance on five public datasets and outperforming eight baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes