Scaling data-driven robotics with reward sketching and batch reinforcement learning
This work addresses the challenge of training robots for diverse real-world tasks where direct reward signals are unavailable, though it appears to be an incremental combination of existing techniques (reward learning and batch RL).
The authors tackled the problem of scaling data-driven robotics to multiple real-world manipulation tasks by developing a framework that learns reward functions from human annotations and uses batch reinforcement learning on large datasets of recorded robot experience. They demonstrated successful performance on three challenging object manipulation tasks including stacking rigid objects and handling cloth.
We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.