Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks
This addresses the challenge of precise robotic manipulation for unseen structures, offering incremental improvements in generalization with minimal data.
The paper tackles the problem of enabling robots to perform unseen rearrangement tasks by proposing a visual foresight model that improves the success rate of a state-of-the-art imitation learning method from 55.4% to 78.5% in simulation and from 30% to 63.3% in real robot experiments.
Rearrangement tasks have been identified as a crucial challenge for intelligent robotic manipulation, but few methods allow for precise construction of unseen structures. We propose a visual foresight model for pick-and-place rearrangement manipulation which is able to learn efficiently. In addition, we develop a multi-modal action proposal module which builds on the Goal-Conditioned Transporter Network, a state-of-the-art imitation learning method. Our image-based task planning method, Transporters with Visual Foresight, is able to learn from only a handful of data and generalize to multiple unseen tasks in a zero-shot manner. TVF is able to improve the performance of a state-of-the-art imitation learning method on unseen tasks in simulation and real robot experiments. In particular, the average success rate on unseen tasks improves from 55.4% to 78.5% in simulation experiments and from 30% to 63.3% in real robot experiments when given only tens of expert demonstrations. Video and code are available on our project website: https://chirikjianlab.github.io/tvf/