MMAP: A Multi-Magnification and Prototype-Aware Architecture for Predicting Spatial Gene Expression
This work addresses a critical problem in spatial transcriptomics for researchers and clinicians by enhancing the accuracy of gene expression prediction from tissue images, though it appears incremental as it builds on existing deep learning approaches.
The paper tackles the challenge of predicting spatial gene expression from histological images by proposing the MMAP framework, which uses multi-magnification patches and prototype embeddings to improve local and global feature extraction, resulting in consistent outperformance over state-of-the-art methods across metrics like MAE, MSE, and PCC.
Spatial Transcriptomics (ST) enables the measurement of gene expression while preserving spatial information, offering critical insights into tissue architecture and disease pathology. Recent developments have explored the use of hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) to predict transcriptome-wide gene expression profiles through deep neural networks. This task is commonly framed as a regression problem, where each input corresponds to a localized image patch extracted from the WSI. However, predicting spatial gene expression from histological images remains a challenging problem due to the significant modality gap between visual features and molecular signals. Recent studies have attempted to incorporate both local and global information into predictive models. Nevertheless, existing methods still suffer from two key limitations: (1) insufficient granularity in local feature extraction, and (2) inadequate coverage of global spatial context. In this work, we propose a novel framework, MMAP (Multi-MAgnification and Prototype-enhanced architecture), that addresses both challenges simultaneously. To enhance local feature granularity, MMAP leverages multi-magnification patch representations that capture fine-grained histological details. To improve global contextual understanding, it learns a set of latent prototype embeddings that serve as compact representations of slide-level information. Extensive experimental results demonstrate that MMAP consistently outperforms all existing state-of-the-art methods across multiple evaluation metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Pearson Correlation Coefficient (PCC).