CO LG ME MLMar 15, 2018

Minimal I-MAP MCMC for Scalable Structure Discovery in Causal DAG Models

Raj Agrawal, Tamara Broderick, Caroline Uhler

arXiv:1803.05554v310.320 citations

Originality Highly original

AI Analysis

This addresses the problem of scalable structure discovery in causal DAG models for researchers and practitioners dealing with high-dimensional data, representing a novel method for a known bottleneck.

The authors tackled the challenge of learning Bayesian networks from high-dimensional data with more variables than data points by proposing a new posterior approximation method that uses empirical conditional independence tests to focus on high-probability DAGs per vertex order. The result is a method that allows flexible prior specification, removes timing dependence on maximum indegree, and achieves superior accuracy, scalability, and sampler mixing on several datasets.

Learning a Bayesian network (BN) from data can be useful for decision-making or discovering causal relationships. However, traditional methods often fail in modern applications, which exhibit a larger number of observed variables than data points. The resulting uncertainty about the underlying network as well as the desire to incorporate prior information recommend a Bayesian approach to learning the BN, but the highly combinatorial structure of BNs poses a striking challenge for inference. The current state-of-the-art methods such as order MCMC are faster than previous methods but prevent the use of many natural structural priors and still have running time exponential in the maximum indegree of the true directed acyclic graph (DAG) of the BN. We here propose an alternative posterior approximation based on the observation that, if we incorporate empirical conditional independence tests, we can focus on a high-probability DAG associated with each order of the vertices. We show that our method allows the desired flexibility in prior specification, removes timing dependence on the maximum indegree and yields provably good posterior approximations; in addition, we show that it achieves superior accuracy, scalability, and sampler mixing on several datasets.

View on arXiv PDF

Similar