LGGNMay 30

On the Recoverability of Causal Relations from Bulk Gene Expression Data

arXiv:2606.0056882.0h-index: 6
AI Analysis

For computational biologists using bulk gene expression data to infer causal networks, the paper provides theoretical and empirical evidence that such recovery is generally unreliable without strong linearity assumptions.

The paper investigates whether causal relations among genes can be recovered from bulk gene expression data, which aggregates single-cell data. It proves that recoverability requires linear aggregation and affine structural equations, but empirical analysis shows that gene regulatory functions deviate from linearity, cautioning against causal recovery without strong assumptions.

Bulk gene expression profiling, which aggregates pooled RNA across cells within a biological sample, remains important in the single-cell era because it is typically less noisy, more sensitive, and more cost-effective than single-cell assays. Accordingly, a growing body of computational methods seeks to recover causal relations among genes from bulk expression data. However, aggregation is a lossy, non-invertible coarsening of the underlying cellular system, and it remains unclear whether and under what conditions causal relations are recoverable from aggregated bulk gene expression data. To answer this, we formalize recoverability under aggregation through two notions of consistency: functional-form consistency and conditional-independence consistency. We then derive necessary and sufficient conditions for recoverability, showing that these properties are preserved only under linear aggregations (e.g., sum/mean) coupled with affine structural equations. To assess the practical plausibility of these conditions, analyses of four bulk and four single-cell gene expression datasets further reveal that the estimated pairwise regulatory functions among genes deviate from linearity in both data types, providing limited empirical support for the linearity assumptions required for recoverability. Together, these results caution against recovering causal relations from aggregated bulk expression data without strong additional assumptions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes