LGNov 14, 2025

When Genes Speak: A Semantic-Guided Framework for Spatially Resolved Transcriptomics Data Clustering

arXiv:2511.11380v11 citationsh-index: 5
Originality Incremental advance
AI Analysis

This work addresses the challenge of better understanding tissue microenvironments in biology by incorporating semantic gene information into clustering models, representing an incremental improvement over existing methods.

The authors tackled the problem of clustering spatial transcriptomics data by integrating biological semantics from gene symbols with spatial relationships, achieving state-of-the-art clustering performance and demonstrating that their Fine-grained Semantic Modulation module can be plugged into other methods to consistently improve results.

Spatial transcriptomics enables gene expression profiling with spatial context, offering unprecedented insights into the tissue microenvironment. However, most computational models treat genes as isolated numerical features, ignoring the rich biological semantics encoded in their symbols. This prevents a truly deep understanding of critical biological characteristics. To overcome this limitation, we present SemST, a semantic-guided deep learning framework for spatial transcriptomics data clustering. SemST leverages Large Language Models (LLMs) to enable genes to "speak" through their symbolic meanings, transforming gene sets within each tissue spot into biologically informed embeddings. These embeddings are then fused with the spatial neighborhood relationships captured by Graph Neural Networks (GNNs), achieving a coherent integration of biological function and spatial structure. We further introduce the Fine-grained Semantic Modulation (FSM) module to optimally exploit these biological priors. The FSM module learns spot-specific affine transformations that empower the semantic embeddings to perform an element-wise calibration of the spatial features, thus dynamically injecting high-order biological knowledge into the spatial context. Extensive experiments on public spatial transcriptomics datasets show that SemST achieves state-of-the-art clustering performance. Crucially, the FSM module exhibits plug-and-play versatility, consistently improving the performance when integrated into other baseline methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes