CVLGJan 19

MultiST: A Cross-Attention-Based Multimodal Model for Spatial Transcriptomic

arXiv:2601.13331v1Has Code
Originality Highly original
AI Analysis

This addresses a bottleneck in spatial transcriptomics analysis for researchers studying tissue organization and cell-cell interactions, representing a novel method for a known limitation.

The authors tackled the problem of integrating histological morphology with molecular profiles in spatial transcriptomics, proposing MultiST which yields spatial domains with clearer boundaries and more biologically interpretable patterns across 13 diverse datasets.

Spatial transcriptomics (ST) enables transcriptome-wide profiling while preserving the spatial context of tissues, offering unprecedented opportunities to study tissue organization and cell-cell interactions in situ. Despite recent advances, existing methods often lack effective integration of histological morphology with molecular profiles, relying on shallow fusion strategies or omitting tissue images altogether, which limits their ability to resolve ambiguous spatial domain boundaries. To address this challenge, we propose MultiST, a unified multimodal framework that jointly models spatial topology, gene expression, and tissue morphology through cross-attention-based fusion. MultiST employs graph-based gene encoders with adversarial alignment to learn robust spatial representations, while integrating color-normalized histological features to capture molecular-morphological dependencies and refine domain boundaries. We evaluated the proposed method on 13 diverse ST datasets spanning two organs, including human brain cortex and breast cancer tissue. MultiST yields spatial domains with clearer and more coherent boundaries than existing methods, leading to more stable pseudotime trajectories and more biologically interpretable cell-cell interaction patterns. The MultiST framework and source code are available at https://github.com/LabJunBMI/MultiST.git.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes