CVNov 22, 2025

Together, Then Apart: Revisiting Multimodal Survival Analysis via a Min-Max Perspective

arXiv:2511.18089v13 citations
Originality Highly original
AI Analysis

This work addresses the challenge of preserving modality-specific characteristics in multi-modal survival analysis for medical applications, offering a novel theoretical perspective.

The paper tackles the problem of integrating heterogeneous modalities like histopathology and genomics in survival analysis by proposing a min-max optimization framework that balances alignment and distinctiveness, resulting in consistent outperformance of state-of-the-art methods on five TCGA benchmarks.

Integrating heterogeneous modalities such as histopathology and genomics is central to advancing survival analysis, yet most existing methods prioritize cross-modal alignment through attention-based fusion mechanisms, often at the expense of modality-specific characteristics. This overemphasis on alignment leads to representation collapse and reduced diversity. In this work, we revisit multi-modal survival analysis via the dual lens of alignment and distinctiveness, positing that preserving modality-specific structure is as vital as achieving semantic coherence. In this paper, we introduce Together-Then-Apart (TTA), a unified min-max optimization framework that simultaneously models shared and modality-specific representations. The Together stage minimizes semantic discrepancies by aligning embeddings via shared prototypes, guided by an unbalanced optimal transport objective that adaptively highlights informative tokens. The Apart stage maximizes representational diversity through modality anchors and a contrastive regularizer that preserve unique modality information and prevent feature collapse. Extensive experiments on five TCGA benchmarks show that TTA consistently outperforms state-of-the-art methods. Beyond empirical gains, our formulation provides a new theoretical perspective of how alignment and distinctiveness can be jointly achieved in for robust, interpretable, and biologically meaningful multi-modal survival analysis.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes