ML LGApr 10, 2019

Active Learning for Decision-Making from Imbalanced Observational Data

Iiris Sundin, Peter Schulam, Eero Siivola, Aki Vehtari, Suchi Saria, Samuel Kaski

arXiv:1904.05268v212.232 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of making reliable personalized decisions, such as in medicine, from imbalanced data, but it is incremental as it builds on existing active learning and error estimation techniques.

The paper tackles the problem of unreliable decision-making from imbalanced observational data in personalized treatment effect prediction by proposing to assess reliability via Type S error rate and using it for active learning to collect new observations instead of making forced choices. The method is demonstrated in simulated binary outcome data and a medical dataset with synthetic continuous outcomes, showing effectiveness in improving decision-making reliability.

Machine learning can help personalized decision support by learning models to predict individual treatment effects (ITE). This work studies the reliability of prediction-based decision-making in a task of deciding which action $a$ to take for a target unit after observing its covariates $\tilde{x}$ and predicted outcomes $\hat{p}(\tilde{y} \mid \tilde{x}, a)$. An example case is personalized medicine and the decision of which treatment to give to a patient. A common problem when learning these models from observational data is imbalance, that is, difference in treated/control covariate distributions, which is known to increase the upper bound of the expected ITE estimation error. We propose to assess the decision-making reliability by estimating the ITE model's Type S error rate, which is the probability of the model inferring the sign of the treatment effect wrong. Furthermore, we use the estimated reliability as a criterion for active learning, in order to collect new (possibly expensive) observations, instead of making a forced choice based on unreliable predictions. We demonstrate the effectiveness of this decision-making aware active learning in two decision-making tasks: in simulated data with binary outcomes and in a medical dataset with synthetic and continuous treatment outcomes.

View on arXiv PDF

Similar