LG AI RO MLJun 12, 2020

SAMBA: Safe Model-Based & Active Reinforcement Learning

Alexander I. Cowen-Rivers, Daniel Palenicek, Vincent Moens, Mohammed Abdullah, Aivar Sootla, Jun Wang, Haitham Ammar

arXiv:2006.09436v115.047 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of safe and efficient learning in dynamical systems, which is crucial for real-world applications like robotics, but it appears incremental as it builds upon existing methods like PILCO.

The paper tackles the problem of safe reinforcement learning by proposing SAMBA, a framework that combines probabilistic modeling, information theory, and statistics to enable active exploration with safety constraints, resulting in orders of magnitude reductions in samples and violations compared to state-of-the-art methods.

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.

View on arXiv PDF

Similar