CV AI CL LG ROJul 27, 2023

Distilled Feature Fields Enable Few-Shot Language-Guided Manipulation

William Shen, Ge Yang, Alan Yu, Jansen Wong, Leslie Pack Kaelbling, Phillip Isola

Stanford

arXiv:2308.07931v231.9168 citationsh-index: 76Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of enabling robots to manipulate novel objects based on natural language commands, though it appears incremental by combining existing techniques.

The paper tackles the problem of robotic manipulation requiring 3D geometry understanding by bridging the 2D-to-3D gap using distilled feature fields, achieving in-the-wild generalization to unseen objects with a few-shot learning method for 6-DOF grasping and placing.

Self-supervised and language-supervised image models contain rich knowledge of the world that is important for generalization. Many robotic tasks, however, require a detailed understanding of 3D geometry, which is often lacking in 2D image features. This work bridges this 2D-to-3D gap for robotic manipulation by leveraging distilled feature fields to combine accurate 3D geometry with rich semantics from 2D foundation models. We present a few-shot learning method for 6-DOF grasping and placing that harnesses these strong spatial and semantic priors to achieve in-the-wild generalization to unseen objects. Using features distilled from a vision-language model, CLIP, we present a way to designate novel objects for manipulation via free-text natural language, and demonstrate its ability to generalize to unseen expressions and novel categories of objects.

View on arXiv PDF Code

Similar