Synthetic Data Generation and Vision-based Wrinkle and Keypoint Detection for Bimanual Cloth Manipulation
For robotic manipulation of textiles, this work provides a practical perception system that works on real fabrics without fine-tuning, addressing the lack of annotated real-world data.
The paper tackles robotic cloth manipulation by developing a synthetic data pipeline and a perception framework combining keypoint detection and wrinkle detection. The keypoint model achieves a Mean Position Error of 1.76 pixels, and the system transfers to real fabrics without fine-tuning, outperforming baselines in high-occlusion scenarios.
Robotic manipulation of textiles remains challenging because continuous deformation and self-occlusions hinder the robust visual perception required to estimate the cloth's state. To address the lack of annotated real-world data, we developed a Blender-based synthetic pipeline exporting auto-annotated keypoints, and combined manually labeled renders with real-world data to train a wrinkle detector. We present a perception framework integrating a CNN for permutation-invariant keypoint detection and a YOLOv8-OpenCV pipeline to extract grasping points from structural wrinkles. A proposed bimanual algorithm uses this system to stretch fully folded garments via wrinkles, transitioning to keypoint-based ironing once corners emerge. The keypoint model achieves a Mean Position Error (MPE) of 1.7615 pixels. The perception system transfers to physical fabrics without fine-tuning, outperforming baselines that fail in high-occlusion states or yield false positives on severe folds.