Text2Grasp: Grasp synthesis by text prompts of object grasping parts
This work addresses the problem of ambiguous control in robotic grasping for researchers and practitioners, offering an incremental improvement through a novel text-guided approach.
The paper tackles the ambiguity in existing grasp synthesis methods by introducing Text2Grasp, which uses text prompts of object grasping parts for more precise control, achieving accurate part-level grasp control and comparable grasp quality in experiments.
The hand plays a pivotal role in human ability to grasp and manipulate objects and controllable grasp synthesis is the key for successfully performing downstream tasks. Existing methods that use human intention or task-level language as control signals for grasping inherently face ambiguity. To address this challenge, we propose a grasp synthesis method guided by text prompts of object grasping parts, Text2Grasp, which provides more precise control. Specifically, we present a two-stage method that includes a text-guided diffusion model TextGraspDiff to first generate a coarse grasp pose, then apply a hand-object contact optimization process to ensure both plausibility and diversity. Furthermore, by leveraging Large Language Model, our method facilitates grasp synthesis guided by task-level and personalized text descriptions without additional manual annotations. Extensive experiments demonstrate that our method achieves not only accurate part-level grasp control but also comparable performance in grasp quality.