RO CVJan 22

DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning

arXiv:2601.16046v12 citationsh-index: 2

Originality Highly original

AI Analysis

This addresses the challenge of generating dexterous grasps for robotic manipulation by providing an embodiment-aware method that improves success rates and control, though it is incremental over existing vision-language approaches.

The paper tackled the problem of language-driven dexterous grasp generation by introducing contact-based embodied reasoning to bridge task semantics with physical constraints, achieving a 67.14% success rate on DexGYS, outperforming state-of-the-art by 3.83 percentage points with a 96.4% improvement in intention alignment.

Language-driven dexterous grasp generation requires the models to understand task semantics, 3D geometry, and complex hand-object interactions. While vision-language models have been applied to this problem, existing approaches directly map observations to grasp parameters without intermediate reasoning about physical interactions. We present DextER, Dexterous Grasp Generation with Embodied Reasoning, which introduces contact-based embodied reasoning for multi-finger manipulation. Our key insight is that predicting which hand links contact where on the object surface provides an embodiment-aware intermediate representation bridging task semantics with physical constraints. DextER autoregressively generates embodied contact tokens specifying which finger links contact where on the object surface, followed by grasp tokens encoding the hand configuration. On DexGYS, DextER achieves 67.14% success rate, outperforming state-of-the-art by 3.83%p with 96.4% improvement in intention alignment. We also demonstrate steerable generation through partial contact specification, providing fine-grained control over grasp synthesis.

View on arXiv PDF

Similar