Learning Keypoints for Robotic Cloth Manipulation using Synthetic Data
This work addresses the problem of enabling assistive robots to handle deformable objects like clothes, but it is incremental as it focuses on improving synthetic data methods for a specific domain.
The paper tackled the challenge of robotic cloth manipulation by developing a synthetic data pipeline to train keypoint detectors for almost-flattened cloth items, achieving an average precision of 64% and keypoint distance of 18 pixels, which improved to 74% mAP and 9 pixels after fine-tuning on real-world data.
Assistive robots should be able to wash, fold or iron clothes. However, due to the variety, deformability and self-occlusions of clothes, creating robot systems for cloth manipulation is challenging. Synthetic data is a promising direction to improve generalization, but the sim-to-real gap limits its effectiveness. To advance the use of synthetic data for cloth manipulation tasks such as robotic folding, we present a synthetic data pipeline to train keypoint detectors for almost-flattened cloth items. To evaluate its performance, we have also collected a real-world dataset. We train detectors for both T-shirts, towels and shorts and obtain an average precision of 64% and an average keypoint distance of 18 pixels. Fine-tuning on real-world data improves performance to 74% mAP and an average distance of only 9 pixels. Furthermore, we describe failure modes of the keypoint detectors and compare different approaches to obtain cloth meshes and materials. We also quantify the remaining sim-to-real gap and argue that further improvements to the fidelity of cloth assets will be required to further reduce this gap. The code, dataset and trained models are available