Cross-Modal Knowledge Distillation from Spatial Transcriptomics to Histology
This work addresses the scarcity of spatial transcriptomics data for biological and clinical applications by enabling histology-only inference of tissue niches, though it is incremental as it builds on existing distillation methods.
The paper tackled the problem of transferring tissue niche structure from costly spatial transcriptomics to abundant H&E histology using cross-modal distillation, resulting in a model that achieves substantially higher agreement with transcriptomics-derived niches than unsupervised baselines and recovers biologically meaningful compositions.
Spatial transcriptomics provides a molecularly rich description of tissue organization, enabling unsupervised discovery of tissue niches -- spatially coherent regions of distinct cell-type composition and function that are relevant to both biological research and clinical interpretation. However, spatial transcriptomics remains costly and scarce, while H&E histology is abundant but carries a less granular signal. We propose to leverage paired spatial transcriptomics and H&E data to transfer transcriptomics-derived niche structure to a histology-only model via cross-modal distillation. Across multiple tissue types and disease contexts, the distilled model achieves substantially higher agreement with transcriptomics-derived niche structure than unsupervised morphology-based baselines trained on identical image features, and recovers biologically meaningful neighborhood composition as confirmed by cell-type analysis. The resulting framework leverages paired spatial transcriptomic and H&E data during training, and can then be applied to held-out tissue regions using histology alone, without any transcriptomic input at inference time.