General-purpose Clothes Manipulation with Semantic Keypoints
This addresses the challenge of robots handling diverse clothes manipulation tasks, which is incremental as it builds on existing methods but extends them to be more general-purpose.
The paper tackles the problem of general-purpose clothes manipulation for household robots by introducing CLASP, which uses semantic keypoints as a sparse spatial-semantic representation to bridge task planning and action execution; experiments show it outperforms baselines in simulation and works on a real robot for tasks like folding, flattening, hanging, and placing.
Clothes manipulation is a critical capability for household robots; yet, existing methods are often confined to specific tasks, such as folding or flattening, due to the complex high-dimensional geometry of deformable fabric. This paper presents CLothes mAnipulation with Semantic keyPoints (CLASP) for general-purpose clothes manipulation, which enables the robot to perform diverse manipulation tasks over different types of clothes. The key idea of CLASP is semantic keypoints -- e.g., "right shoulder", "left sleeve", etc. -- a sparse spatial-semantic representation that is salient for both perception and action. Semantic keypoints of clothes can be effectively extracted from depth images and are sufficient to represent a broad range of clothes manipulation policies. CLASP leverages semantic keypoints to bridge LLM-powered task planning and low-level action execution in a two-level hierarchy. Extensive simulation experiments show that CLASP outperforms baseline methods across diverse clothes types in both seen and unseen tasks. Further, experiments with a Kinova dual-arm system on four distinct tasks -- folding, flattening, hanging, and placing -- confirm CLASP's performance on a real robot.