ROAILGNEMLSep 20, 2017

Using Parameterized Black-Box Priors to Scale Up Model-Based Policy Search for Robotics

arXiv:1709.06917v243 citations
Originality Incremental advance
AI Analysis

This addresses the problem of scaling reinforcement learning to high-dimensional state/action spaces for robotics, with incremental improvements over prior methods.

The paper tackles the scalability issue of model-based policy search algorithms in high-dimensional robotics by introducing parameterized black-box priors, achieving more data-efficient learning and enabling a physical hexapod robot to learn new gaits in 16 to 30 seconds.

The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced Black-DROPS algorithm exploits a black-box optimization algorithm to achieve both high data-efficiency and good computation times when several cores are used; nevertheless, like all model-based policy search approaches, Black-DROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in Black-DROPS that leverages parameterized black-box priors to (1) scale up to high-dimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our approach with the "pendubot" swing-up task in simulation and with a physical hexapod robot (48D state space, 18D action space) that has to walk forward as fast as possible. The results show that our new algorithm is more data-efficient than previous model-based policy search algorithms (with and without priors) and that it can allow a physical 6-legged robot to learn new gaits in only 16 to 30 seconds of interaction time.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes