LGMay 19

What Makes a Representation Good for Single-Cell Perturbation Prediction?

arXiv:2605.1934318.61 citations
Predicted impact top 28% in LG · last 90 daysOriginality Highly original
AI Analysis

For researchers in single-cell biology and perturbation modeling, this work addresses a fundamental signal imbalance problem that limits existing methods, offering a principled solution with improved predictive performance.

The paper identifies that single-cell perturbation prediction is hindered by the dominance of perturbation-invariant signals over sparse perturbation-specific signals. The proposed PerturbedVAE framework explicitly separates these signals and achieves state-of-the-art performance on a benchmark, with significant gains in out-of-distribution combinatorial predictions.

Single-cell perturbation modeling is fundamental for understanding and predicting cellular responses to genetic perturbations. However, existing approaches, from causal representation learning to foundation models, often struggle with an overlooked challenge: gene expression is dominated by perturbation-invariant information, while perturbation-specific signals are intrinsically sparse. As a result, learned representations either entangle invariant and perturbation-specific information, leading to spurious and non-generalizable predictors, or suppress perturbation-specific signals altogether, rendering them ineffective for prediction. To address this, we propose PerturbedVAE, a general framework designed to resolve this signal imbalance. The framework explicitly separates perturbation-specific information from dominant invariant structure and recovers causal representations to effectively utilize such information for prediction. We further provide an identifiability analysis that characterizes the conditions under which sparse perturbation effects can be reliably recovered, thereby clarifying how the framework can be concretely specified under such conditions. Empirically, PerturbedVAE achieves state-of-the-art performance on a widely used benchmark across multiple evaluation settings, yielding significant gains on out-of-distribution combinatorial predictions and uncovering interpretable perturbation-response programs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes