Analyzing scRNA-seq data by CCP-assisted UMAP and t-SNE
This work addresses the problem of enhancing dimensionality reduction for scRNA-seq data analysis, which is incremental as it applies an existing preprocessing method to improve standard visualization techniques.
The authors tackled the challenge of analyzing sparse and high-dimensional single-cell RNA sequencing data by using Correlated Clustering and Projection (CCP) as an initialization tool for UMAP and t-SNE, resulting in significant improvements in visualization and accuracy across eight datasets.
Single-cell RNA sequencing (scRNA-seq) is widely used to reveal heterogeneity in cells, which has given us insights into cell-cell communication, cell differentiation, and differential gene expression. However, analyzing scRNA-seq data is a challenge due to sparsity and the large number of genes involved. Therefore, dimensionality reduction and feature selection are important for removing spurious signals and enhancing downstream analysis. Correlated clustering and projection (CCP) was recently introduced as an effective method for preprocessing scRNA-seq data. CCP utilizes gene-gene correlations to partition the genes and, based on the partition, employs cell-cell interactions to obtain super-genes. Because CCP is a data-domain approach that does not require matrix diagonalization, it can be used in many downstream machine learning tasks. In this work, we utilize CCP as an initialization tool for uniform manifold approximation and projection (UMAP) and t-distributed stochastic neighbor embedding (t-SNE). By using eight publicly available datasets, we have found that CCP significantly improves UMAP and t-SNE visualization and dramatically improve their accuracy.