AI LG MLFeb 23, 2018

Learning Optimal Policies from Observational Data

Onur Atan, William R. Zame, M van der Schaar

arXiv:1802.08679v112.719 citations

Originality Synthesis-oriented

AI Analysis

This work addresses policy optimization in domains like medicine and finance where controlled experiments are costly, though it appears incremental by applying existing methods to observational data.

The paper tackles the problem of learning optimal policies from observational data, addressing challenges like missing counterfactuals and selection bias by deriving theoretical bounds and using domain adversarial neural networks, achieving effectiveness on a semi-synthetic breast cancer dataset and a UCI dataset.

Choosing optimal (or at least better) policies is an important problem in domains from medicine to education to finance and many others. One approach to this problem is through controlled experiments/trials - but controlled experiments are expensive. Hence it is important to choose the best policies on the basis of observational data. This presents two difficult challenges: (i) missing counterfactuals, and (ii) selection bias. This paper presents theoretical bounds on estimation errors of counterfactuals from observational data by making connections to domain adaptation theory. It also presents a principled way of choosing optimal policies using domain adversarial neural networks. We illustrate the effectiveness of domain adversarial training together with various features of our algorithm on a semi-synthetic breast cancer dataset and a supervised UCI dataset (Statlog).

View on arXiv PDF

Similar