AP LG MLDec 30, 2016

Counterfactual Prediction with Deep Instrumental Variables Networks

Jason Hartford, Greg Lewis, Kevin Leyton-Brown, Matt Taddy

arXiv:1612.09596v111.748 citations

Originality Incremental advance

AI Analysis

This provides a method for causal inference in scenarios with instrumental variables, which is incremental as it adapts existing deep learning tools to a known causal estimation challenge.

The paper tackles the problem of estimating causal effects in the presence of instrumental variables by proposing a Deep IV framework that decomposes the task into two neural network prediction problems, achieving results that leverage off-the-shelf ML capabilities without extensive customization.

We are in the middle of a remarkable rise in the use and capability of artificial intelligence. Much of this growth has been fueled by the success of deep learning architectures: models that map from observables to outputs via multiple layers of latent representations. These deep learning algorithms are effective tools for unstructured prediction, and they can be combined in AI systems to solve complex automated reasoning problems. This paper provides a recipe for combining ML algorithms to solve for causal effects in the presence of instrumental variables -- sources of treatment randomization that are conditionally independent from the response. We show that a flexible IV specification resolves into two prediction tasks that can be solved with deep neural nets: a first-stage network for treatment prediction and a second-stage network whose loss function involves integration over the conditional treatment distribution. This Deep IV framework imposes some specific structure on the stochastic gradient descent routine used for training, but it is general enough that we can take advantage of off-the-shelf ML capabilities and avoid extensive algorithm customization. We outline how to obtain out-of-sample causal validation in order to avoid over-fit. We also introduce schemes for both Bayesian and frequentist inference: the former via a novel adaptation of dropout training, and the latter via a data splitting routine.

View on arXiv PDF

Similar