CVJan 28

MMSF: Multitask and Multimodal Supervised Framework for WSI Classification and Survival Analysis

Chengying She, Chengwei Chen, Xinran Zhang, Ben Wang, Lizhuang Liu, Chengwei Shao, Yun Bian

arXiv:2601.20347v11.5h-index: 5

Originality Highly original

AI Analysis

This work addresses the problem of improving cancer diagnosis and prognosis for patients by integrating heterogeneous data in computational pathology, representing a strong specific gain with incremental methodological advancements.

The paper tackles the challenge of integrating multimodal data (whole slide images and clinical descriptors) for computational pathology by introducing MMSF, a multitask and multimodal supervised framework, which achieves accuracy and AUC improvements of 2.1–6.6% and 2.2–6.9% on classification tasks and C-index improvements of 7.1–9.8% on survival analysis compared to baselines.

Multimodal evidence is critical in computational pathology: gigapixel whole slide images capture tumor morphology, while patient-level clinical descriptors preserve complementary context for prognosis. Integrating such heterogeneous signals remains challenging because feature spaces exhibit distinct statistics and scales. We introduce MMSF, a multitask and multimodal supervised framework built on a linear-complexity MIL backbone that explicitly decomposes and fuses cross-modal information. MMSF comprises a graph feature extraction module embedding tissue topology at the patch level, a clinical data embedding module standardizing patient attributes, a feature fusion module aligning modality-shared and modality-specific representations, and a Mamba-based MIL encoder with multitask prediction heads. Experiments on CAMELYON16 and TCGA-NSCLC demonstrate 2.1--6.6\% accuracy and 2.2--6.9\% AUC improvements over competitive baselines, while evaluations on five TCGA survival cohorts yield 7.1--9.8\% C-index improvements compared with unimodal methods and 5.6--7.1\% over multimodal alternatives.

View on arXiv PDF

Similar