LGBIO-PHQMSep 18, 2025

Random Matrix Theory-guided sparse PCA for single-cell RNA-seq data

arXiv:2509.15429v11 citations
Originality Highly original
AI Analysis

This work addresses the problem of robust dimensionality reduction for single-cell RNA-seq data, offering a parameter-free solution that enhances interpretability and performance for researchers in genomics and computational biology.

The authors tackled the challenge of noisy single-cell RNA-seq data by developing a Random Matrix Theory-guided sparse PCA method that automatically selects sparsity levels, improving principal subspace reconstruction and outperforming existing methods in cell-type classification across multiple technologies.

Single-cell RNA-seq provides detailed molecular snapshots of individual cells but is notoriously noisy. Variability stems from biological differences, PCR amplification bias, limited sequencing depth, and low capture efficiency, making it challenging to adapt computational pipelines to heterogeneous datasets or evolving technologies. As a result, most studies still rely on principal component analysis (PCA) for dimensionality reduction, valued for its interpretability and robustness. Here, we improve upon PCA with a Random Matrix Theory (RMT)-based approach that guides the inference of sparse principal components using existing sparse PCA algorithms. We first introduce a novel biwhitening method, inspired by the Sinkhorn-Knopp algorithm, that simultaneously stabilizes variance across genes and cells. This enables the use of an RMT-based criterion to automatically select the sparsity level, rendering sparse PCA nearly parameter-free. Our mathematically grounded approach retains the interpretability of PCA while enabling robust, hands-off inference of sparse principal components. Across seven single-cell RNA-seq technologies and four sparse PCA algorithms, we show that this method systematically improves the reconstruction of the principal subspace and consistently outperforms PCA-, autoencoder-, and diffusion-based methods in cell-type classification tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes