CVJun 24, 2025

AMF-MedIT: An Efficient Align-Modulation-Fusion Framework for Medical Image-Tabular Data

arXiv:2506.19439v22 citationsh-index: 4Biomedical Signal Processing and Control
Originality Incremental advance
AI Analysis

This work addresses multimodal fusion challenges in medical AI, offering a practical solution for clinical applications, though it appears incremental as it builds on existing self-supervised and fusion techniques.

The paper tackled the problem of effectively fusing medical image and tabular data by addressing cross-modal discrepancies and noise, resulting in a framework that achieves superior accuracy, robustness, and data efficiency in multimodal classification tasks.

Multimodal medical analysis combining image and tabular data has gained increasing attention. However, effective fusion remains challenging due to cross-modal discrepancies in feature dimensions and modality contributions, as well as the noise from high-dimensional tabular inputs. To address these problems, we present AMF-MedIT, an efficient Align-Modulation-Fusion framework for medical image and tabular data integration, particularly under data-scarce conditions. Built upon a self-supervised learning strategy, we introduce the Adaptive Modulation and Fusion (AMF) module, a novel, streamlined fusion paradigm that harmonizes dimension discrepancies and dynamically balances modality contributions. It integrates prior knowledge to guide the allocation of modality contributions in the fusion and employs feature masks together with magnitude and leakage losses to adjust the dimensionality and magnitude of unimodal features. Additionally, we develop FT-Mamba, a powerful tabular encoder leveraging a selective mechanism to handle noisy medical tabular data efficiently. Extensive experiments, including simulations of clinical noise, demonstrate that AMF-MedIT achieves superior accuracy, robustness, and data efficiency across multimodal classification tasks. Interpretability analyses further reveal how FT-Mamba shapes multimodal pretraining and enhances the image encoder's attention, highlighting the practical value of our framework for reliable and efficient clinical artificial intelligence applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes