LGAIROOct 26, 2018

Deep Intrinsically Motivated Continuous Actor-Critic for Efficient Robotic Visuomotor Skill Learning

arXiv:1810.11388v222 citations
Originality Incremental advance
AI Analysis

This addresses the problem of data-efficient and stable continuous control learning from pixels for robotics, representing an incremental improvement over existing actor-critic methods.

The paper tackles robotic visuomotor skill learning from raw visual input by introducing an intrinsically motivated actor-critic algorithm that combines intrinsic rewards from predictive world models with extrinsic rewards. The results show it achieves better performance than state-of-the-art methods in both dense- and sparse-reward settings on reaching and grasping tasks in simulation and on a humanoid robot.

In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes