ML LGJan 13, 2020

POPCORN: Partially Observed Prediction COnstrained ReiNforcement Learning

Joseph Futoma, Michael C. Hughes, Finale Doshi-Velez

arXiv:2001.04032v220.159 citationsHas Code

Originality Highly original

AI Analysis

This addresses the problem of suboptimal planning in medical POMDPs for healthcare practitioners, offering a novel method that is not incremental.

The paper tackles the failure of two-stage POMDP approaches in medical decision-making by introducing a new optimization objective that produces high-performing policies and generative models, even with irrelevant observations, and works in batch off-policy settings typical in healthcare.

Many medical decision-making tasks can be framed as partially observed Markov decision processes (POMDPs). However, prevailing two-stage approaches that first learn a POMDP and then solve it often fail because the model that best fits the data may not be well suited for planning. We introduce a new optimization objective that (a) produces both high-performing policies and high-quality generative models, even when some observations are irrelevant for planning, and (b) does so in batch off-policy settings that are typical in healthcare, when only retrospective data is available. We demonstrate our approach on synthetic examples and a challenging medical decision-making problem.

View on arXiv PDF Code

Similar