LG MLNov 3, 2021

Causal-BALD: Deep Bayesian Active Learning of Outcomes to Infer Treatment-Effects from Observational Data

Andrew Jesson, Panagiotis Tigas, Joost van Amersfoort, Andreas Kirsch, Uri Shalit, Yarin Gal

arXiv:2111.02275v216.840 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the challenge of sample-efficient treatment effect estimation in fields like healthcare, where data acquisition is expensive, by providing a novel active learning method to reduce bias and improve efficiency.

The paper tackles the problem of estimating personalized treatment effects from observational data when measuring individual outcomes is costly, by introducing causal Bayesian acquisition functions that bias data acquisition towards regions with overlapping support to maximize sample efficiency, demonstrating performance on synthetic and semi-synthetic datasets like IHDP and CMNIST.

Estimating personalized treatment effects from high-dimensional observational data is essential in situations where experimental designs are infeasible, unethical, or expensive. Existing approaches rely on fitting deep models on outcomes observed for treated and control populations. However, when measuring individual outcomes is costly, as is the case of a tumor biopsy, a sample-efficient strategy for acquiring each result is required. Deep Bayesian active learning provides a framework for efficient data acquisition by selecting points with high uncertainty. However, existing methods bias training data acquisition towards regions of non-overlapping support between the treated and control populations. These are not sample-efficient because the treatment effect is not identifiable in such regions. We introduce causal, Bayesian acquisition functions grounded in information theory that bias data acquisition towards regions with overlapping support to maximize sample efficiency for learning personalized treatment effects. We demonstrate the performance of the proposed acquisition strategies on synthetic and semi-synthetic datasets IHDP and CMNIST and their extensions, which aim to simulate common dataset biases and pathologies.

View on arXiv PDF Code

Similar