Coarse-to-Fine 3D Keyframe Transporter
This work improves sample efficiency for manipulation tasks in robotics, though it is incremental as it builds on existing Keyframe IL and Transporter Networks.
The paper tackled the sample inefficiency in Keyframe Imitation Learning by leveraging bi-equivariant symmetries, resulting in a method that outperformed baselines by over 10% in simulation and 55% in physical experiments.
Recent advances in Keyframe Imitation Learning (IL) have enabled learning-based agents to solve a diverse range of manipulation tasks. However, most approaches ignore the rich symmetries in the problem setting and, as a consequence, are sample-inefficient. This work identifies and utilizes the bi-equivariant symmetry within Keyframe IL to design a policy that generalizes to transformations of both the workspace and the objects grasped by the gripper. We make two main contributions: First, we analyze the bi-equivariance properties of the keyframe action scheme and propose a Keyframe Transporter derived from the Transporter Networks, which evaluates actions using cross-correlation between the features of the grasped object and the features of the scene. Second, we propose a computationally efficient coarse-to-fine SE(3) action evaluation scheme for reasoning the intertwined translation and rotation action. The resulting method outperforms strong Keyframe IL baselines by an average of >10% on a wide range of simulation tasks, and by an average of 55% in 4 physical experiments.