RO CVJan 27, 2025

BiFold: Bimanual Cloth Folding with Language Guidance

Oriol Barbany, Adrià Colomé, Carme Torras

arXiv:2501.16458v211.06 citationsh-index: 4ICRA

Originality Incremental advance

AI Analysis

This addresses the challenge of robotic cloth folding for applications in domestic or industrial settings, but it is incremental as it builds on pre-trained models and existing benchmarks.

The paper tackles the problem of bimanual cloth folding by learning folding actions conditioned on text commands, achieving state-of-the-art performance on an existing benchmark and strong generalization to new instructions, garments, and environments.

Cloth folding is a complex task due to the inevitable self-occlusions of clothes, their complicated dynamics, and the disparate materials, geometries, and textures that garments can have. In this work, we learn folding actions conditioned on text commands. Translating high-level, abstract instructions into precise robotic actions requires sophisticated language understanding and manipulation capabilities. To do that, we leverage a pre-trained vision-language model and repurpose it to predict manipulation actions. Our model, BiFold, can take context into account and achieves state-of-the-art performance on an existing language-conditioned folding benchmark. To address the lack of annotated bimanual folding data, we introduce a novel dataset with automatically parsed actions and language-aligned instructions, enabling better learning of text-conditioned manipulation. BiFold attains the best performance on our dataset and demonstrates strong generalization to new instructions, garments, and environments.

View on arXiv PDF

Similar