Cross-modal Diffusion Modelling for Super-resolved Spatial Transcriptomics
This work addresses a domain-specific problem for researchers in spatial transcriptomics by improving resolution, though it appears incremental as it builds on existing diffusion model approaches.
The paper tackles the problem of low resolution in spatial transcriptomics (ST) by proposing a cross-modal conditional diffusion model that integrates histology images with gene expression to super-resolve ST maps, achieving state-of-the-art performance on three public datasets.
The recent advancement of spatial transcriptomics (ST) allows to characterize spatial gene expression within tissue for discovery research. However, current ST platforms suffer from low resolution, hindering in-depth understanding of spatial gene expression. Super-resolution approaches promise to enhance ST maps by integrating histology images with gene expressions of profiled tissue spots. However, current super-resolution methods are limited by restoration uncertainty and mode collapse. Although diffusion models have shown promise in capturing complex interactions between multi-modal conditions, it remains a challenge to integrate histology images and gene expression for super-resolved ST maps. This paper proposes a cross-modal conditional diffusion model for super-resolving ST maps with the guidance of histology images. Specifically, we design a multi-modal disentangling network with cross-modal adaptive modulation to utilize complementary information from histology images and spatial gene expression. Moreover, we propose a dynamic cross-attention modelling strategy to extract hierarchical cell-to-tissue information from histology images. Lastly, we propose a co-expression-based gene-correlation graph network to model the co-expression relationship of multiple genes. Experiments show that our method outperforms other state-of-the-art methods in ST super-resolution on three public datasets.