RO CV LGNov 13, 2020

Robust Policies via Mid-Level Visual Representations: An Experimental Study in Manipulation and Navigation

Bryan Chen, Alexander Sax, Gene Lewis, Iro Armeni, Silvio Savarese, Amir Zamir, Jitendra Malik, Lerrel Pinto

arXiv:2011.06698v122.649 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of robust policy training in robotics for tasks like manipulation and navigation, offering a scalable alternative to existing methods, though it is incremental in building on prior work on visual representations.

The study tackled the problem of high sample complexity and brittleness in end-to-end vision-based robotics by using mid-level visual representations as perceptual states in RL, resulting in improved generalization, sample efficiency, and higher performance compared to domain randomization and learning-from-scratch methods, with successful zero-shot sim-to-real experiments on real robots.

Vision-based robotics often separates the control loop into one module for perception and a separate module for control. It is possible to train the whole system end-to-end (e.g. with deep RL), but doing it "from scratch" comes with a high sample complexity cost and the final result is often brittle, failing unexpectedly if the test environment differs from that of training. We study the effects of using mid-level visual representations (features learned asynchronously for traditional computer vision objectives), as a generic and easy-to-decode perceptual state in an end-to-end RL framework. Mid-level representations encode invariances about the world, and we show that they aid generalization, improve sample complexity, and lead to a higher final performance. Compared to other approaches for incorporating invariances, such as domain randomization, asynchronously trained mid-level representations scale better: both to harder problems and to larger domain shifts. In practice, this means that mid-level representations could be used to successfully train policies for tasks where domain randomization and learning-from-scratch failed. We report results on both manipulation and navigation tasks, and for navigation include zero-shot sim-to-real experiments on real robots.

View on arXiv PDF

Similar