LGJul 11, 2025

scE$^2$TM: Toward Interpretable Single-Cell Embedding via Topic Modeling

Hegang Chen, Yuyin Lu, Zhiming Dai, Fu Lee Wang, Qing Li, Yanghui Rao

arXiv:2507.08355v14.1h-index: 6

Originality Incremental advance

AI Analysis

This work addresses interpretability and analytical constraints in single-cell RNA-seq data analysis for researchers, though it is incremental as it builds upon existing topic models with external guidance.

The paper tackles the problem of interpretability and performance limitations in single-cell embedded topic models by introducing scE2TM, which incorporates external biological knowledge, resulting in significant clustering performance gains across 20 datasets compared to 7 state-of-the-art methods and improved interpretability as measured by new quantitative metrics.

Recent advances in sequencing technologies have enabled researchers to explore cellular heterogeneity at single-cell resolution. Meanwhile, interpretability has gained prominence parallel to the rapid increase in the complexity and performance of deep learning models. In recent years, topic models have been widely used for interpretable single-cell embedding learning and clustering analysis, which we refer to as single-cell embedded topic models. However, previous studies evaluated the interpretability of the models mainly through qualitative analysis, and these single-cell embedded topic models suffer from the potential problem of interpretation collapse. Furthermore, their neglect of external biological knowledge constrains analytical performance. Here, we present scE2TM, an external knowledge-guided single-cell embedded topic model that provides a high-quality cell embedding and strong interpretation, contributing to comprehensive scRNA-seq data analysis. Our comprehensive evaluation across 20 scRNA-seq datasets demonstrates that scE2TM achieves significant clustering performance gains compared to 7 state-of-the-art methods. In addition, we propose a new interpretability evaluation benchmark that introduces 10 metrics to quantitatively assess the interpretability of single-cell embedded topic models. The results show that the interpretation provided by scE2TM performs encouragingly in terms of diversity and consistency with the underlying biological signals, contributing to a better revealing of the underlying biological mechanisms.

View on arXiv PDF

Similar