RO AI CV LGOct 11, 2018

A Data-Efficient Framework for Training and Sim-to-Real Transfer of Navigation Policies

Homanga Bharadhwaj, Zihan Wang, Yoshua Bengio, Liam Paull

arXiv:1810.04871v118.143 citations

Originality Incremental advance

AI Analysis

This work addresses the problem of costly and dangerous real-world robot training for researchers and practitioners in robotics, offering an incremental improvement by combining simulation-based meta-learning and adversarial domain transfer.

The paper tackles the challenge of training visuomotor navigation policies for robots by introducing a framework that uses simulation and off-policy data to reduce real-world training costs, achieving successful planning performances in various navigation tasks with far fewer expert demonstrations.

Learning effective visuomotor policies for robots purely from data is challenging, but also appealing since a learning-based system should not require manual tuning or calibration. In the case of a robot operating in a real environment the training process can be costly, time-consuming, and even dangerous since failures are common at the start of training. For this reason, it is desirable to be able to leverage \textit{simulation} and \textit{off-policy} data to the extent possible to train the robot. In this work, we introduce a robust framework that plans in simulation and transfers well to the real environment. Our model incorporates a gradient-descent based planning module, which, given the initial image and goal image, encodes the images to a lower dimensional latent state and plans a trajectory to reach the goal. The model, consisting of the encoder and planner modules, is trained through a meta-learning strategy in simulation first. We subsequently perform adversarial domain transfer on the encoder by using a bank of unlabelled but random images from the simulation and real environments to enable the encoder to map images from the real and simulated environments to a similarly distributed latent representation. By fine tuning the entire model (encoder + planner) with far fewer real world expert demonstrations, we show successful planning performances in different navigation tasks.

View on arXiv PDF

Similar