USEEK: Unsupervised SE(3)-Equivariant 3D Keypoints for Generalizable Manipulation
This addresses the challenge of generalizable manipulation for robots, particularly in handling intra-category shape variance with limited demonstrations, though it is incremental as it builds on keypoint-based representations.
The paper tackles the problem of enabling robots to manipulate unseen objects within a category using only a single demonstration, by introducing USEEK, an unsupervised SE(3)-equivariant keypoint method that achieves alignment across instances, allowing manipulation from and to any poses.
Can a robot manipulate intra-category unseen objects in arbitrary poses with the help of a mere demonstration of grasping pose on a single object instance? In this paper, we try to address this intriguing challenge by using USEEK, an unsupervised SE(3)-equivariant keypoints method that enjoys alignment across instances in a category, to perform generalizable manipulation. USEEK follows a teacher-student structure to decouple the unsupervised keypoint discovery and SE(3)-equivariant keypoint detection. With USEEK in hand, the robot can infer the category-level task-relevant object frames in an efficient and explainable manner, enabling manipulation of any intra-category objects from and to any poses. Through extensive experiments, we demonstrate that the keypoints produced by USEEK possess rich semantics, thus successfully transferring the functional knowledge from the demonstration object to the novel ones. Compared with other object representations for manipulation, USEEK is more adaptive in the face of large intra-category shape variance, more robust with limited demonstrations, and more efficient at inference time.