Learning to Push by Grasping: Using multiple tasks for effective learning
This addresses the scalability problem in robotics for researchers and practitioners by demonstrating that multi-task learning can reduce data needs, though it is incremental as it builds on existing end-to-end frameworks.
The paper tackles the high data requirements of end-to-end robot control by proposing multi-task learning, showing that models trained jointly on grasping and pushing tasks outperform task-specific models with the same total data, e.g., achieving better grasping performance with 2.5K grasp and 2.5K push examples than with 5K grasp examples alone.
Recently, end-to-end learning frameworks are gaining prevalence in the field of robot control. These frameworks input states/images and directly predict the torques or the action parameters. However, these approaches are often critiqued due to their huge data requirements for learning a task. The argument of the difficulty in scalability to multiple tasks is well founded, since training these tasks often require hundreds or thousands of examples. But do end-to-end approaches need to learn a unique model for every task? Intuitively, it seems that sharing across tasks should help since all tasks require some common understanding of the environment. In this paper, we attempt to take the next step in data-driven end-to-end learning frameworks: move from the realm of task-specific models to joint learning of multiple robot tasks. In an astonishing result we show that models with multi-task learning tend to perform better than task-specific models trained with same amounts of data. For example, a deep-network learned with 2.5K grasp and 2.5K push examples performs better on grasping than a network trained on 5K grasp examples.