RO LGOct 29, 2020

Bayes-Adaptive Deep Model-Based Policy Optimisation

arXiv:2010.15948v34.12 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of sample inefficiency in reinforcement learning for researchers and practitioners, representing an incremental improvement with a novel hybrid approach.

The paper tackles sample-efficient policy optimization in reinforcement learning by introducing a Bayesian model-based method (RoMBRL) that captures model uncertainty through a Bayes-adaptive Markov decision process and deep Bayesian neural networks. The result shows that RoMBRL outperforms existing approaches on challenging control benchmarks in terms of sample complexity and task performance.

We introduce a Bayesian (deep) model-based reinforcement learning method (RoMBRL) that can capture model uncertainty to achieve sample-efficient policy optimisation. We propose to formulate the model-based policy optimisation problem as a Bayes-adaptive Markov decision process (BAMDP). RoMBRL maintains model uncertainty via belief distributions through a deep Bayesian neural network whose samples are generated via stochastic gradient Hamiltonian Monte Carlo. Uncertainty is propagated through simulations controlled by sampled models and history-based policies. As beliefs are encoded in visited histories, we propose a history-based policy network that can be end-to-end trained to generalise across history space and will be trained using recurrent Trust-Region Policy Optimisation. We show that RoMBRL outperforms existing approaches on many challenging control benchmark tasks in terms of sample complexity and task performance. The source code of this paper is also publicly available on https://github.com/thobotics/RoMBRL.

View on arXiv PDF Code

Similar