LG AI CV RODec 30, 2020

Model-Based Visual Planning with Self-Supervised Functional Distances

Stephen Tian, Suraj Nair, Frederik Ebert, Sudeep Dasari, Benjamin Eysenbach, Chelsea Finn, Sergey Levine

arXiv:2012.15373v119.569 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the challenge of learning goal-reaching policies for generalist robots, which is a significant problem for robotics researchers and practitioners, by enabling learning from unlabeled offline data.

The paper tackles the problem of learning goal-reaching policies for generalist robots without hand-engineered reward functions, using a self-supervised method that combines a visual dynamics model and a learned dynamical distance function. The method successfully performs various tasks, such as moving objects with a simulated robotic arm and opening/closing a drawer with a real-world robot, substantially outperforming prior model-free and model-based methods.

A generalist robot must be able to complete a variety of tasks in its environment. One appealing way to specify each task is in terms of a goal observation. However, learning goal-reaching policies with reinforcement learning remains a challenging problem, particularly when hand-engineered reward functions are not available. Learned dynamics models are a promising approach for learning about the environment without rewards or task-directed data, but planning to reach goals with such a model requires a notion of functional similarity between observations and goal states. We present a self-supervised method for model-based visual goal reaching, which uses both a visual dynamics model as well as a dynamical distance function learned using model-free reinforcement learning. Our approach learns entirely using offline, unlabeled data, making it practical to scale to large and diverse datasets. In our experiments, we find that our method can successfully learn models that perform a variety of tasks at test-time, moving objects amid distractors with a simulated robotic arm and even learning to open and close a drawer using a real-world robot. In comparisons, we find that this approach substantially outperforms both model-free and model-based prior methods. Videos and visualizations are available here: http://sites.google.com/berkeley.edu/mbold.

View on arXiv PDF Code

Similar