ROCVOct 30, 2024

Neural Attention Field: Emerging Point Relevance in 3D Scenes for One-Shot Dexterous Grasping

arXiv:2410.23039v19 citationsh-index: 50CoRL
Originality Incremental advance
AI Analysis

This addresses a challenging problem in robotics for enabling more adaptable and efficient grasping in varied environments, though it appears incremental as it builds on existing feature field approaches.

The paper tackles the problem of one-shot transfer of dexterous grasps to novel 3D scenes with object and context variations by proposing a neural attention field that models inter-point relevance, resulting in significant improvements in success rates on real robots compared to feature-field-based methods.

One-shot transfer of dexterous grasps to novel scenes with object and context variations has been a challenging problem. While distilled feature fields from large vision models have enabled semantic correspondences across 3D scenes, their features are point-based and restricted to object surfaces, limiting their capability of modeling complex semantic feature distributions for hand-object interactions. In this work, we propose the \textit{neural attention field} for representing semantic-aware dense feature fields in the 3D space by modeling inter-point relevance instead of individual point features. Core to it is a transformer decoder that computes the cross-attention between any 3D query point with all the scene points, and provides the query point feature with an attention-based aggregation. We further propose a self-supervised framework for training the transformer decoder from only a few 3D pointclouds without hand demonstrations. Post-training, the attention field can be applied to novel scenes for semantics-aware dexterous grasping from one-shot demonstration. Experiments show that our method provides better optimization landscapes by encouraging the end-effector to focus on task-relevant scene regions, resulting in significant improvements in success rates on real robots compared with the feature-field-based methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes