On the Effectiveness of Retrieval, Alignment, and Replay in Manipulation
This addresses the problem of inefficient robot manipulation learning for researchers and practitioners, offering a novel paradigm that improves performance.
The paper tackles the inefficiency of end-to-end imitation learning for visual manipulation by proposing a three-phase decomposition (retrieval, alignment, replay), achieving unprecedented learning efficiency and effective generalization in real-world tasks like grasping and pouring.
Imitation learning with visual observations is notoriously inefficient when addressed with end-to-end behavioural cloning methods. In this paper, we explore an alternative paradigm which decomposes reasoning into three phases. First, a retrieval phase, which informs the robot what it can do with an object. Second, an alignment phase, which informs the robot where to interact with the object. And third, a replay phase, which informs the robot how to interact with the object. Through a series of real-world experiments on everyday tasks, such as grasping, pouring, and inserting objects, we show that this decomposition brings unprecedented learning efficiency, and effective inter- and intra-class generalisation. Videos are available at https://www.robot-learning.uk/retrieval-alignment-replay.