Latent Gene Diffusion for Spatial Transcriptomics Completion
This work addresses limitations in ST data analysis for researchers by providing a more accurate and robust method for gene expression completion, though it is incremental as it builds on existing diffusion models in a specific domain.
The paper tackled the problem of data dropout in spatial transcriptomics (ST) by introducing LGDiST, a reference-free latent gene diffusion model, which reduced average Mean Squared Error by 18% across 26 datasets and improved gene expression prediction performance by up to 10% on six methods.
Computer Vision has proven to be a powerful tool for analyzing Spatial Transcriptomics (ST) data. However, current models that predict spatially resolved gene expression from histopathology images suffer from significant limitations due to data dropout. Most existing approaches rely on single-cell RNA sequencing references, making them dependent on alignment quality and external datasets while also risking batch effects and inherited dropout. In this paper, we address these limitations by introducing LGDiST, the first reference-free latent gene diffusion model for ST data dropout. We show that LGDiST outperforms the previous state-of-the-art in gene expression completion, with an average Mean Squared Error that is 18% lower across 26 datasets. Furthermore, we demonstrate that completing ST data with LGDiST improves gene expression prediction performance on six state-of-the-art methods up to 10% in MSE. A key innovation of LGDiST is using context genes previously considered uninformative to build a rich and biologically meaningful genetic latent space. Our experiments show that removing key components of LGDiST, such as the context genes, the ST latent space, and the neighbor conditioning, leads to considerable drops in performance. These findings underscore that the full architecture of LGDiST achieves substantially better performance than any of its isolated components.