DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation
This addresses the challenge of enabling robots to handle transparent objects in everyday settings, which is an incremental advance in robotic manipulation.
The paper tackles the problem of robotic manipulation of transparent objects, which is limited to short-horizon tasks, by proposing DeLTa, a framework that integrates depth estimation, pose estimation, and vision-language planning for precise long-horizon manipulation guided by natural instructions, and it significantly outperforms existing approaches in these scenarios.
Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities.Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. To address this limitation, we propose DeLTa (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. Project page: https://sites.google.com/view/DeLTa25/