ROCVJan 27, 2025

BiFold: Bimanual Cloth Folding with Language Guidance

arXiv:2501.16458v26 citationsh-index: 4ICRA
Originality Incremental advance
AI Analysis

This addresses the challenge of robotic cloth folding for applications in domestic or industrial settings, but it is incremental as it builds on pre-trained models and existing benchmarks.

The paper tackles the problem of bimanual cloth folding by learning folding actions conditioned on text commands, achieving state-of-the-art performance on an existing benchmark and strong generalization to new instructions, garments, and environments.

Cloth folding is a complex task due to the inevitable self-occlusions of clothes, their complicated dynamics, and the disparate materials, geometries, and textures that garments can have. In this work, we learn folding actions conditioned on text commands. Translating high-level, abstract instructions into precise robotic actions requires sophisticated language understanding and manipulation capabilities. To do that, we leverage a pre-trained vision-language model and repurpose it to predict manipulation actions. Our model, BiFold, can take context into account and achieves state-of-the-art performance on an existing language-conditioned folding benchmark. To address the lack of annotated bimanual folding data, we introduce a novel dataset with automatically parsed actions and language-aligned instructions, enabling better learning of text-conditioned manipulation. BiFold attains the best performance on our dataset and demonstrates strong generalization to new instructions, garments, and environments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes