LGMLJul 15, 2017

Learning linear structural equation models in polynomial time and sample complexity

arXiv:1707.04673v194 citations
Originality Highly original
AI Analysis

This addresses a fundamental problem in causal inference for researchers and practitioners, offering improved efficiency and generality over existing methods, though it is incremental in advancing computational and statistical aspects.

The paper tackles the problem of learning linear structural equation models (SEMs) from observational data by developing a new algorithm that is computationally and statistically efficient, recovering the directed acyclic graph structure under a general identifiability condition without faithfulness assumptions, with results including polynomial time complexities (e.g., O(p(d^2 + log p)) in population) and sample complexities (e.g., O(d^8/ε^2 log(p/√δ)) for sub-Gaussian noise).

The problem of learning structural equation models (SEMs) from data is a fundamental problem in causal inference. We develop a new algorithm --- which is computationally and statistically efficient and works in the high-dimensional regime --- for learning linear SEMs from purely observational data with arbitrary noise distribution. We consider three aspects of the problem: identifiability, computational efficiency, and statistical efficiency. We show that when data is generated from a linear SEM over $p$ nodes and maximum degree $d$, our algorithm recovers the directed acyclic graph (DAG) structure of the SEM under an identifiability condition that is more general than those considered in the literature, and without faithfulness assumptions. In the population setting, our algorithm recovers the DAG structure in $\mathcal{O}(p(d^2 + \log p))$ operations. In the finite sample setting, if the estimated precision matrix is sparse, our algorithm has a smoothed complexity of $\widetilde{\mathcal{O}}(p^3 + pd^7)$, while if the estimated precision matrix is dense, our algorithm has a smoothed complexity of $\widetilde{\mathcal{O}}(p^5)$. For sub-Gaussian noise, we show that our algorithm has a sample complexity of $\mathcal{O}(\frac{d^8}{\varepsilon^2} \log (\frac{p}{\sqrtδ}))$ to achieve $\varepsilon$ element-wise additive error with respect to the true autoregression matrix with probability at most $1 - δ$, while for noise with bounded $(4m)$-th moment, with $m$ being a positive integer, our algorithm has a sample complexity of $\mathcal{O}(\frac{d^8}{\varepsilon^2} (\frac{p^2}δ)^{1/m})$.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes