CVMay 25

Context-driven Missing-Modality Learning for Robust Medical Diagnosis with Image-Tabular Data

Tianling Liu, Lequan Yu, Tong Han, Liang Wan

arXiv:2605.2596836.3

AI Analysis

This work addresses the practical problem of missing modalities in multimodal medical diagnosis, which degrades model performance, and offers a robust solution that outperforms current approaches.

The paper proposes a Context-driven Missing-Modality Learning (CMML) framework for robust medical diagnosis when image or tabular data is missing. It achieves state-of-the-art results on three medical datasets, with average AUC improvements of 1.26%, 0.97%, and 1.32% over existing methods.

While multimodal data integrating diverse imaging and clinical tabular records is crucial for accurate medical diagnosis, the arbitrary absence of specific modalities is prevalent in clinical practice, severely degrading the performance of multimodal models. Existing methods either discard missing modalities, leading to information loss, or struggle to synthesize them without capturing complex inter-modal dependencies. To address these limitations, we propose a novel Context-driven Missing-Modality Learning (CMML) framework, which sequentially performs modality synthesis and semantic alignment to achieve robust diagnosis under arbitrary missing conditions. Specifically, we design a Cascade Residual Transformer-based Autoencoder (CRTA) that leverages learnable context tokens acting as dataset-level semantic prior to capture inter-modal dependencies and synthesize key missing representations. These representations are further enriched by modality-specific memory banks. To resolve the discrepancy between original available and synthesized representations, we transform the learned context tokens into instance-adaptive semantic references by infusing multimodal representations from the CRTA's outputs. This reference guides the alignment of heterogeneous modality representations into a unified space, where class-aware contrastive refinement is finally applied to explore discriminative diagnostic cues. Extensive evaluations on skin lesion (Derm7pt), ocular disease (ODIR), and meningioma (MEN) datasets demonstrate that CMML significantly outperforms state-of-the-art (SOTA) methods, yielding AVG AUC improvements of 1.26%, 0.97%, and 1.32%, respectively.

View on arXiv PDF

Similar