QMLGMNMar 21, 2024

Gene Regulatory Network Inference in the Presence of Dropouts: a Causal View

arXiv:2403.15500v111 citationsh-index: 42ICLR
Originality Incremental advance
AI Analysis

This work addresses a specific challenge in computational biology for researchers analyzing single-cell data, offering a principled framework to handle dropouts without introducing spurious relations, though it is incremental as it builds on existing structure learning methods.

The paper tackles the problem of gene regulatory network inference (GRNI) in single-cell RNA sequencing data, where technical zeros (dropouts) distort gene expression distributions, by introducing a causal graphical model that enables accurate conditional independence testing after deleting samples with zeros for conditioned variables, leading to improved network inference without imputation.

Gene regulatory network inference (GRNI) is a challenging problem, particularly owing to the presence of zeros in single-cell RNA sequencing data: some are biological zeros representing no gene expression, while some others are technical zeros arising from the sequencing procedure (aka dropouts), which may bias GRNI by distorting the joint distribution of the measured gene expressions. Existing approaches typically handle dropout error via imputation, which may introduce spurious relations as the true joint distribution is generally unidentifiable. To tackle this issue, we introduce a causal graphical model to characterize the dropout mechanism, namely, Causal Dropout Model. We provide a simple yet effective theoretical result: interestingly, the conditional independence (CI) relations in the data with dropouts, after deleting the samples with zero values (regardless if technical or not) for the conditioned variables, are asymptotically identical to the CI relations in the original data without dropouts. This particular test-wise deletion procedure, in which we perform CI tests on the samples without zeros for the conditioned variables, can be seamlessly integrated with existing structure learning approaches including constraint-based and greedy score-based methods, thus giving rise to a principled framework for GRNI in the presence of dropouts. We further show that the causal dropout model can be validated from data, and many existing statistical models to handle dropouts fit into our model as specific parametric instances. Empirical evaluation on synthetic, curated, and real-world experimental transcriptomic data comprehensively demonstrate the efficacy of our method.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes