Visual Foresight: Model-Based Deep Reinforcement Learning for Vision-Based Robotic Control
This work addresses the challenge of making deep RL practical and generalizable for robotics, which is crucial for applications like autonomous manipulation, though it builds incrementally on existing model-based and visual RL methods.
The paper tackles the problem of enabling deep reinforcement learning for real-world robotic manipulation by developing a self-supervised model-based approach that predicts future states from raw camera images, allowing generalization to unseen objects and tasks without ground truth rewards. It demonstrates that visual model predictive control can handle rigid and deformable objects and solve user-defined tasks using a single model.
Deep reinforcement learning (RL) algorithms can learn complex robotic skills from raw sensory inputs, but have yet to achieve the kind of broad generalization and applicability demonstrated by deep learning methods in supervised domains. We present a deep RL method that is practical for real-world robotics tasks, such as robotic manipulation, and generalizes effectively to never-before-seen tasks and objects. In these settings, ground truth reward signals are typically unavailable, and we therefore propose a self-supervised model-based approach, where a predictive model learns to directly predict the future from raw sensory readings, such as camera images. At test time, we explore three distinct goal specification methods: designated pixels, where a user specifies desired object manipulation tasks by selecting particular pixels in an image and corresponding goal positions, goal images, where the desired goal state is specified with an image, and image classifiers, which define spaces of goal states. Our deep predictive models are trained using data collected autonomously and continuously by a robot interacting with hundreds of objects, without human supervision. We demonstrate that visual MPC can generalize to never-before-seen objects---both rigid and deformable---and solve a range of user-defined object manipulation tasks using the same model.