Interventional Experiment Design for Causal Structure Learning
This work addresses a fundamental challenge in causal inference for researchers and practitioners, offering incremental algorithmic improvements for intervention design in structure learning.
The paper tackles the problem of learning causal directed acyclic graphs (DAGs) beyond Markov equivalence by designing non-adaptive interventions with a budget constraint, aiming to maximize the number of edges whose directions are identified. It proposes efficient algorithms for tree structures and extends them to general causal structures, with evaluation on synthetic and real data showing improved edge identification.
It is known that from purely observational data, a causal DAG is identifiable only up to its Markov equivalence class, and for many ground truth DAGs, the direction of a large portion of the edges will be remained unidentified. The golden standard for learning the causal DAG beyond Markov equivalence is to perform a sequence of interventions in the system and use the data gathered from the interventional distributions. We consider a setup in which given a budget $k$, we design $k$ interventions non-adaptively. We cast the problem of finding the best intervention target set as an optimization problem which aims to maximize the number of edges whose directions are identified due to the performed interventions. First, we consider the case that the underlying causal structure is a tree. For this case, we propose an efficient exact algorithm for the worst-case gain setup, as well as an approximate algorithm for the average gain setup. We then show that the proposed approach for the average gain setup can be extended to the case of general causal structures. In this case, besides the design of interventions, calculating the objective function is also challenging. We propose an efficient exact calculator as well as two estimators for this task. We evaluate the proposed methods using synthetic as well as real data.