RO AI LG SYMay 21, 2020

Guided Uncertainty-Aware Policy Optimization: Combining Learning and Model-Based Strategies for Sample-Efficient Policy Learning

Michelle A. Lee, Carlos Florensa, Jonathan Tremblay, Nathan Ratliff, Animesh Garg, Fabio Ramos, Dieter Fox

arXiv:2005.10872v223.569 citations

Originality Incremental advance

AI Analysis

This addresses sample efficiency and robustness issues in robotics for tasks like peg insertion, though it appears incremental as it builds on existing model-based and learning-based strategies.

The paper tackles the problem of sample-inefficient and brittle reinforcement learning in robotics by combining model-based and learning-based methods, achieving minimal environment interactions and overcoming inaccuracies in perception/actuation, as demonstrated on a real-world robot performing peg insertion.

Traditional robotic approaches rely on an accurate model of the environment, a detailed description of how to perform the task, and a robust perception system to keep track of the current state. On the other hand, reinforcement learning approaches can operate directly from raw sensory inputs with only a reward signal to describe the task, but are extremely sample-inefficient and brittle. In this work, we combine the strengths of model-based methods with the flexibility of learning-based methods to obtain a general method that is able to overcome inaccuracies in the robotics perception/actuation pipeline, while requiring minimal interactions with the environment. This is achieved by leveraging uncertainty estimates to divide the space in regions where the given model-based policy is reliable, and regions where it may have flaws or not be well defined. In these uncertain regions, we show that a locally learned-policy can be used directly with raw sensory inputs. We test our algorithm, Guided Uncertainty-Aware Policy Optimization (GUAPO), on a real-world robot performing peg insertion. Videos are available at https://sites.google.com/view/guapo-rl

View on arXiv PDF

Similar