LG AI CV ROOct 11, 2018

One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

Tom Le Paine, Sergio Gómez Colmenarejo, Ziyu Wang, Scott Reed, Yusuf Aytar, Tobias Pfaff, Matt W. Hoffman, Gabriel Barth-Maron, Serkan Cabi, David Budden, Nando de Freitas

arXiv:1810.05017v111.726 citations

Originality Incremental advance

AI Analysis

This addresses the open problem of high-fidelity imitation in autonomous agents, which is incremental as it builds on existing RL methods with larger networks.

The paper tackles the problem of enabling autonomous agents to perform high-fidelity one-shot imitation of diverse novel skills and solve tasks more efficiently than demonstrators, achieving this by training large-scale deep neural networks with an off-policy RL algorithm (MetaMimic) on a challenging manipulation task.

Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task. The results also show that both types of policy can be learned from vision, in spite of the task rewards being sparse, and without access to demonstrator actions.

View on arXiv PDF

Similar