AIMEJul 4, 2012

On the Number of Experiments Sufficient and in the Worst Case Necessary to Identify All Causal Relations Among N Variables

arXiv:1207.1389v1170 citations
Originality Incremental advance
AI Analysis

This work addresses a foundational problem in causal inference for researchers, offering theoretical bounds that can reduce experimental costs in fields like medicine or social sciences, though it is incremental in refining existing worst-case analysis.

The paper tackles the problem of determining the minimum number of experiments needed to identify all causal relations among N variables, showing that log2(N) + 1 experiments are sufficient and necessary in the worst case when variables can be simultaneously randomized, and providing bounds for cases where each experiment randomizes a fraction of variables, which are lower than previous N - 1 bounds for large N.

We show that if any number of variables are allowed to be simultaneously and independently randomized in any one experiment, log2(N) + 1 experiments are sufficient and in the worst case necessary to determine the causal relations among N >= 2 variables when no latent variables, no sample selection bias and no feedback cycles are present. For all K, 0 < K < 1/(2N) we provide an upper bound on the number experiments required to determine causal structure when each experiment simultaneously randomizes K variables. For large N, these bounds are significantly lower than the N - 1 bound required when each experiment randomizes at most one variable. For kmax < N/2, we show that (N/kmax-1)+N/(2kmax)log2(kmax) experiments aresufficient and in the worst case necessary. We over a conjecture as to the minimal number of experiments that are in the worst case sufficient to identify all causal relations among N observed variables that are a subset of the vertices of a DAG.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes