LG MLJun 11, 2020

PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes

arXiv:2006.06352v21 citations

AI Analysis

This work addresses personalized medical treatment by providing theoretical guarantees for learning algorithms, though it is incremental as it builds on existing frameworks for contextual MDPs.

The paper tackles batch multi-task reinforcement learning with contextual Markov decision processes, deriving sample complexity bounds for imitation learning and showing that model-based learning can be impossible or have exponential sample complexity, ultimately justifying imitation learning in this setting.

We consider the problem of batch multi-task reinforcement learning with observed context descriptors, motivated by its application to personalized medical treatment. In particular, we study two general classes of learning algorithms: direct policy learning (DPL), an imitation-learning based approach which learns from expert trajectories, and model-based learning. First, we derive sample complexity bounds for DPL, and then show that model-based learning from expert actions can, even with a finite model class, be impossible. After relaxing the conditions under which the model-based approach is expected to learn by allowing for greater coverage of state-action space, we provide sample complexity bounds for model-based learning with finite model classes, showing that there exist model classes with sample complexity exponential in their statistical complexity. We then derive a sample complexity upper bound for model-based learning based on a measure of concentration of the data distribution. Our results give formal justification for imitation learning over model-based learning in this setting.

View on arXiv PDF

Similar