CVJan 2, 2024

AliFuse: Aligning and Fusing Multi-modal Medical Data for Computer-Aided Diagnosis

arXiv:2401.01074v310.512 citationsh-index: 4Has CodeBIBM

Originality Incremental advance

AI Analysis

This addresses the problem of effectively integrating diverse medical data for improved diagnostic accuracy in healthcare, representing an incremental advancement in multimodal fusion methods.

The paper tackled the challenge of fusing multimodal medical data for computer-aided diagnosis by proposing AliFuse, a transformer-based framework that aligns and fuses images and clinical records, achieving state-of-the-art performance in Alzheimer's disease classification on five public datasets and outperforming eight baselines.

Medical data collected for diagnostic decisions are typically multimodal, providing comprehensive information on a subject. While computer-aided diagnosis systems can benefit from multimodal inputs, effectively fusing such data remains a challenging task and a key focus in medical research. In this paper, we propose a transformer-based framework, called Alifuse, for aligning and fusing multimodal medical data. Specifically, we convert medical images and both unstructured and structured clinical records into vision and language tokens, employing intramodal and intermodal attention mechanisms to learn unified representations of all imaging and non-imaging data for classification. Additionally, we integrate restoration modeling with contrastive learning frameworks, jointly learning the high-level semantic alignment between images and texts and the low-level understanding of one modality with the help of another. We apply Alifuse to classify Alzheimer's disease, achieving state-of-the-art performance on five public datasets and outperforming eight baselines.

View on arXiv PDF Code

Similar